git.libav.org Git - libav.git/rss log

Libav master git repository

http://git.libav.org/?p=libav.git;a=summary

Les articles publiés sur le site

  • aarch64 : vp8 : Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version

    1er février 2019, par Martin Storsjö
    aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version
    
                         Cortex A53    A72    A73
    vp8_luma_dc_wht_c:        115.7   75.7   90.7
    vp8_luma_dc_wht_neon:      60.7   41.2   45.7
    vp8_idct_dc_add4uv_c:     376.1  262.9  282.5
    vp8_idct_dc_add4uv_neon:   52.0   29.0   37.0
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp8dsp_init_aarch64.c
    • [DBH] libavcodec/aarch64/vp8dsp_neon.S
  • aarch64 : vp8 : Optimize put_epel16_h6v6 with vp8_epel8_v6_y2

    1er février 2019, par Martin Storsjö
    aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2
    
    This makes it similar to put_epel16_v6, and gives a large speedup
    on Cortex A53, a minor speedup on A72 and a very minor slowdown on
    A73.
    
    Before:                 Cortex A53     A72     A73
    vp8_put_epel16_h6v6_neon:   2211.4  1586.5  1431.7
    After:
    vp8_put_epel16_h6v6_neon:   1736.9  1522.0  1448.1
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp8dsp_neon.S
  • aarch64 : vp8 : Optimize vp8_idct_add_neon for aarch64

    31 janvier 2019, par Martin Storsjö
    aarch64: vp8: Optimize vp8_idct_add_neon for aarch64
    
    The previous version was a pretty exact translation of the arm
    version. This version does do some unnecessary arithemetic (it does
    more operations on vectors that are only half filled; it does 4
    uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
    of packing data together (which could be done for free in the arm
    version).
    
    This gives a decent speedup on Cortex A53, a minor speedup on
    A72 and a very minor slowdown on Cortex A73.
    
    Before:        Cortex A53    A72    A73
    vp8_idct_add_neon:   79.7   67.5   65.0
    After:
    vp8_idct_add_neon:   67.7   64.8   66.7
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp8dsp_neon.S
  • aarch64 : vp8 : Skip saturating in shrn in ff_vp8_idct_add_neon

    31 janvier 2019, par Martin Storsjö
    aarch64: vp8: Skip saturating in shrn in ff_vp8_idct_add_neon
    
    The original arm version didn't do saturation here. This probably
    doesn't make any difference for performance, but reduces the
    differences.
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp8dsp_neon.S
  • aarch64 : vp8 : Fix assembling with armasm64

    31 janvier 2019, par Martin Storsjö
    aarch64: vp8: Fix assembling with armasm64
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp8dsp_neon.S