git.videolan.org Git - ffmpeg.git/rss log

FFmpeg git repo

http://git.videolan.org/?p=ffmpeg.git;a=summary

Les articles publiés sur le site

  • avcodec/aarch64/vvc : Optimize NEON version of vvc_dmvr

    3 mars, par Krzysztof Pyrkosz
    avcodec/aarch64/vvc: Optimize NEON version of vvc_dmvr
    
    This patch replaces blocks of instructions performing rounding and
    widening shifts with one-liners achieving the same result.
    
    Before and after on A78
    dmvr_8_12x20_neon:                                      86.2 ( 6.90x)
    dmvr_8_20x12_neon:                                      94.8 ( 5.93x)
    dmvr_8_20x20_neon:                                     141.5 ( 6.50x)
    dmvr_12_12x20_neon:                                    158.0 ( 3.76x)
    dmvr_12_20x12_neon:                                    151.2 ( 3.73x)
    dmvr_12_20x20_neon:                                    247.2 ( 3.71x)
    dmvr_hv_8_12x20_neon:                                  423.2 ( 3.75x)
    dmvr_hv_8_20x12_neon:                                  434.0 ( 3.69x)
    dmvr_hv_8_20x20_neon:                                  706.0 ( 3.69x)
    
    dmvr_8_12x20_neon:                                      77.2 ( 7.70x)
    dmvr_8_20x12_neon:                                      66.5 ( 8.49x)
    dmvr_8_20x20_neon:                                      92.2 ( 9.90x)
    dmvr_12_12x20_neon:                                     80.2 ( 7.38x)
    dmvr_12_20x12_neon:                                     58.2 ( 9.59x)
    dmvr_12_20x20_neon:                                     90.0 (10.15x)
    dmvr_hv_8_12x20_neon:                                  369.0 ( 4.34x)
    dmvr_hv_8_20x12_neon:                                  355.8 ( 4.49x)
    dmvr_hv_8_20x20_neon:                                  574.2 ( 4.51x)
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DH] libavcodec/aarch64/vvc/inter.S
  • swscale/aarch64 : dotprod implementation of rgba32_to_Y

    3 mars, par Krzysztof Pyrkosz
    swscale/aarch64: dotprod implementation of rgba32_to_Y
    
    The idea is to split the 16 bit coefficients into lower and upper half,
    invoke udot for the lower half, shift by 8, and follow by udot for the
    upper half.
    
    Benchmark on A78:
    bgra_to_y_128_c:                                       682.0 ( 1.00x)
    bgra_to_y_128_neon:                                    181.2 ( 3.76x)
    bgra_to_y_128_dotprod:                                 117.8 ( 5.79x)
    bgra_to_y_1080_c:                                     5742.5 ( 1.00x)
    bgra_to_y_1080_neon:                                  1472.5 ( 3.90x)
    bgra_to_y_1080_dotprod:                                906.5 ( 6.33x)
    bgra_to_y_1920_c:                                    10194.0 ( 1.00x)
    bgra_to_y_1920_neon:                                  2589.8 ( 3.94x)
    bgra_to_y_1920_dotprod:                               1573.8 ( 6.48x)
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DH] libswscale/aarch64/input.S
    • [DH] libswscale/aarch64/swscale.c
  • avcodec/mpeg12enc : Simplify writing bits

    3 mars, par Andreas Rheinhardt
    avcodec/mpeg12enc: Simplify writing bits
    
    Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
    
    • [DH] libavcodec/mpeg12enc.c
  • avcodec/mpegvideo : Mark ff_mpv_common_defaults() as av_cold

    3 mars, par Andreas Rheinhardt
    avcodec/mpegvideo: Mark ff_mpv_common_defaults() as av_cold
    
    Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
    
    • [DH] libavcodec/mpegvideo.c
  • avcodec/speedhqenc : Inline ff_speedhq_mb_y_order_to_mb()

    3 mars, par Andreas Rheinhardt
    avcodec/speedhqenc: Inline ff_speedhq_mb_y_order_to_mb()
    
    It is an extremely simple function that is only called once,
    so it should be inlined.
    
    Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
    
    • [DH] libavcodec/speedhqenc.c
    • [DH] libavcodec/speedhqenc.h