git.libav.org Git - libav.git/rss log
Libav master git repository
Les articles publiés sur le site
-
aarch64 : vp8 : Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version
1er février 2019, par Martin Storsjöaarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version Cortex A53 A72 A73 vp8_luma_dc_wht_c: 115.7 75.7 90.7 vp8_luma_dc_wht_neon: 60.7 41.2 45.7 vp8_idct_dc_add4uv_c: 376.1 262.9 282.5 vp8_idct_dc_add4uv_neon: 52.0 29.0 37.0 Signed-off-by: Martin Storsjö <martin@martin.st>
-
aarch64 : vp8 : Optimize put_epel16_h6v6 with vp8_epel8_v6_y2
1er février 2019, par Martin Storsjöaarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2 This makes it similar to put_epel16_v6, and gives a large speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on A73. Before: Cortex A53 A72 A73 vp8_put_epel16_h6v6_neon: 2211.4 1586.5 1431.7 After: vp8_put_epel16_h6v6_neon: 1736.9 1522.0 1448.1 Signed-off-by: Martin Storsjö <martin@martin.st>
-
aarch64 : vp8 : Optimize vp8_idct_add_neon for aarch64
31 janvier 2019, par Martin Storsjöaarch64: vp8: Optimize vp8_idct_add_neon for aarch64 The previous version was a pretty exact translation of the arm version. This version does do some unnecessary arithemetic (it does more operations on vectors that are only half filled; it does 4 uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead of packing data together (which could be done for free in the arm version). This gives a decent speedup on Cortex A53, a minor speedup on A72 and a very minor slowdown on Cortex A73. Before: Cortex A53 A72 A73 vp8_idct_add_neon: 79.7 67.5 65.0 After: vp8_idct_add_neon: 67.7 64.8 66.7 Signed-off-by: Martin Storsjö <martin@martin.st>
-
aarch64 : vp8 : Skip saturating in shrn in ff_vp8_idct_add_neon
31 janvier 2019, par Martin Storsjö -
aarch64 : vp8 : Fix assembling with armasm64
31 janvier 2019, par Martin Storsjö