git.videolan.org Git - ffmpeg.git/rss log

FFmpeg git repo

http://git.videolan.org/?p=ffmpeg.git;a=summary

Articles published on the website

  • doc/APIchanges: add missing rgbaf16 pixfmt entry

    16 August, by Timo Rothenpieler
    doc/APIchanges: add missing rgbaf16 pixfmt entry
    
    • [DH] doc/APIchanges
  • lavu/tx: optimize and simplify inverse MDCTs

    15 August, by Lynne
    lavu/tx: optimize and simplify inverse MDCTs
    
    Convert the input from a scatter to a gather instead,
    which is faster and better for SIMD.
    Also, add a pre-shuffled exptab version to avoid
    gathering there at all. This doubles the exptab size,
    but the speedup makes it worth it. In SIMD, the
    exptab will likely be purged to a higher cache
    anyway because of the FFT in the middle, and
    the amount of loads stays identical.
    
    For a 960-point inverse MDCT, the speedup is 10%.
    
    This makes it possible to write sane and fast SIMD
    versions of inverse MDCTs.
    
    • [DH] libavutil/tx.c
    • [DH] libavutil/tx_priv.h
    • [DH] libavutil/tx_template.c
  • swscale/aarch64: add vscale specializations

    13 August, by Swinney, Jonathan
    swscale/aarch64: add vscale specializations
    
    This commit adds new code paths for vscale when filterSize is 2, 4, or
    8. By using specialized code with unrolling to match the filterSize we
    can improve performance.
    
    On AWS c7g (Graviton 3, Neoverse V1) instances:
                                     before   after
    yuv2yuvX_2_0_512_accurate_neon:  558.8    268.9
    yuv2yuvX_4_0_512_accurate_neon:  637.5    434.9
    yuv2yuvX_8_0_512_accurate_neon:  1144.8   806.2
    yuv2yuvX_16_0_512_accurate_neon: 2080.5   1853.7
    
    Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DH] libswscale/aarch64/output.S
    • [DH] libswscale/aarch64/swscale.c
  • swscale/aarch64: vscale optimization

    13 August, by Swinney, Jonathan
    swscale/aarch64: vscale optimization
    
    Use scalar times vector multiply accumlate instructions instead of
    vector times vector to remove the need for replicating load instructions
    which are slightly slower.
    
    On AWS c7g (Graviton 3, Neoverse V1) instances:
    yuv2yuvX_8_0_512_accurate_neon:  1144.8  987.4
    yuv2yuvX_16_0_512_accurate_neon: 2080.5 1869.4
    
    Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DH] libswscale/aarch64/output.S
  • checkasm: updated tests for sw_scale

    13 August, by Swinney, Jonathan
    checkasm: updated tests for sw_scale
    
    Change the reference to exactly match the C reference in swscale,
    instead of exactly matching the x86 SIMD implementations (which
    differs slightly). Test with and without SWS_ACCURATE_RND - if this
    flag isn't set, the output must match the C reference exactly,
    otherwise it is allowed to be off by 2.
    
    Mark a couple x86 functions as unavailable when SWS_ACCURATE_RND
    is set - apparently this discrepancy hasn't been noticed in other
    exact tests before.
    
    Add a test for yuv2plane1.
    
    Signed-off-by: Jonathan Swinney <jswinney@amazon.com>
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DH] libswscale/x86/swscale.c
    • [DH] tests/checkasm/sw_scale.c