git.videolan.org Git - ffmpeg.git/rss log
FFmpeg git repo
Articles published on the website
-
doc/APIchanges: add missing rgbaf16 pixfmt entry
16 August, by Timo Rothenpieler -
lavu/tx: optimize and simplify inverse MDCTs
15 August, by Lynnelavu/tx: optimize and simplify inverse MDCTs Convert the input from a scatter to a gather instead, which is faster and better for SIMD. Also, add a pre-shuffled exptab version to avoid gathering there at all. This doubles the exptab size, but the speedup makes it worth it. In SIMD, the exptab will likely be purged to a higher cache anyway because of the FFT in the middle, and the amount of loads stays identical. For a 960-point inverse MDCT, the speedup is 10%. This makes it possible to write sane and fast SIMD versions of inverse MDCTs.
-
swscale/aarch64: add vscale specializations
13 August, by Swinney, Jonathanswscale/aarch64: add vscale specializations This commit adds new code paths for vscale when filterSize is 2, 4, or 8. By using specialized code with unrolling to match the filterSize we can improve performance. On AWS c7g (Graviton 3, Neoverse V1) instances: before after yuv2yuvX_2_0_512_accurate_neon: 558.8 268.9 yuv2yuvX_4_0_512_accurate_neon: 637.5 434.9 yuv2yuvX_8_0_512_accurate_neon: 1144.8 806.2 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1853.7 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
-
swscale/aarch64: vscale optimization
13 August, by Swinney, Jonathanswscale/aarch64: vscale optimization Use scalar times vector multiply accumlate instructions instead of vector times vector to remove the need for replicating load instructions which are slightly slower. On AWS c7g (Graviton 3, Neoverse V1) instances: yuv2yuvX_8_0_512_accurate_neon: 1144.8 987.4 yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>
-
checkasm: updated tests for sw_scale
13 August, by Swinney, Jonathancheckasm: updated tests for sw_scale Change the reference to exactly match the C reference in swscale, instead of exactly matching the x86 SIMD implementations (which differs slightly). Test with and without SWS_ACCURATE_RND - if this flag isn't set, the output must match the C reference exactly, otherwise it is allowed to be off by 2. Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND is set - apparently this discrepancy hasn't been noticed in other exact tests before. Add a test for yuv2plane1. Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>