git.libav.org Git - libav.git/rss log

Libav master git repository

http://git.libav.org/?p=libav.git;a=summary

Les articles publiés sur le site

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 3211

aarch64 : vp8 : Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version

1er février 2019, par Martin Storsjö

aarch64: vp8: Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version

                     Cortex A53    A72    A73
vp8_luma_dc_wht_c:        115.7   75.7   90.7
vp8_luma_dc_wht_neon:      60.7   41.2   45.7
vp8_idct_dc_add4uv_c:     376.1  262.9  282.5
vp8_idct_dc_add4uv_neon:   52.0   29.0   37.0

Signed-off-by: Martin Storsjö <martin@martin.st>

[D B H] libavcodec/aarch64/vp8dsp_init_aarch64.c
[D B H] libavcodec/aarch64/vp8dsp_neon.S

aarch64 : vp8 : Optimize put_epel16_h6v6 with vp8_epel8_v6_y2

1er février 2019, par Martin Storsjö

aarch64: vp8: Optimize put_epel16_h6v6 with vp8_epel8_v6_y2

This makes it similar to put_epel16_v6, and gives a large speedup
on Cortex A53, a minor speedup on A72 and a very minor slowdown on
A73.

Before:                 Cortex A53     A72     A73
vp8_put_epel16_h6v6_neon:   2211.4  1586.5  1431.7
After:
vp8_put_epel16_h6v6_neon:   1736.9  1522.0  1448.1

Signed-off-by: Martin Storsjö <martin@martin.st>

[D B H] libavcodec/aarch64/vp8dsp_neon.S

aarch64 : vp8 : Optimize vp8_idct_add_neon for aarch64

31 janvier 2019, par Martin Storsjö

aarch64: vp8: Optimize vp8_idct_add_neon for aarch64

The previous version was a pretty exact translation of the arm
version. This version does do some unnecessary arithemetic (it does
more operations on vectors that are only half filled; it does 4
uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
of packing data together (which could be done for free in the arm
version).

This gives a decent speedup on Cortex A53, a minor speedup on
A72 and a very minor slowdown on Cortex A73.

Before:        Cortex A53    A72    A73
vp8_idct_add_neon:   79.7   67.5   65.0
After:
vp8_idct_add_neon:   67.7   64.8   66.7

Signed-off-by: Martin Storsjö <martin@martin.st>

[D B H] libavcodec/aarch64/vp8dsp_neon.S

aarch64 : vp8 : Skip saturating in shrn in ff_vp8_idct_add_neon

31 janvier 2019, par Martin Storsjö

aarch64: vp8: Skip saturating in shrn in ff_vp8_idct_add_neon

The original arm version didn't do saturation here. This probably
doesn't make any difference for performance, but reduces the
differences.

Signed-off-by: Martin Storsjö <martin@martin.st>

[D B H] libavcodec/aarch64/vp8dsp_neon.S

aarch64 : vp8 : Fix assembling with armasm64

31 janvier 2019, par Martin Storsjö
```
aarch64: vp8: Fix assembling with armasm64

Signed-off-by: Martin Storsjö <martin@martin.st>
```
- [D B H] libavcodec/aarch64/vp8dsp_neon.S

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 3211

git.libav.org Git - libav.git/rss log

Les articles publiés sur le site

aarch64 : vp8 : Port vp8_luma_dc_wht and vp8_idct_dc_add4uv from arm version

aarch64 : vp8 : Optimize put_epel16_h6v6 with vp8_epel8_v6_y2

aarch64 : vp8 : Optimize vp8_idct_add_neon for aarch64

aarch64 : vp8 : Skip saturating in shrn in ff_vp8_idct_add_neon

aarch64 : vp8 : Fix assembling with armasm64

Se connecter

Se connecter

Navigation

Sites Web

Boussole SPIP