git.libav.org Git - libav.git/rss log

Libav master git repository

http://git.libav.org/?p=libav.git;a=summary

Les articles publiés sur le site

  • aarch64 : vp9itxfm : Use w3 instead of x3 for the int eob parameter

    18 novembre 2016, par Martin Storsjö
    aarch64: vp9itxfm: Use w3 instead of x3 for the int eob parameter
    
    The clobbering tests in checkasm are only invoked when testing
    correctness, so this bug didn't show up when benchmarking the
    dc-only version.
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp9itxfm_neon.S
  • hlsenc : Fix the openssl support

    18 novembre 2016, par Luca Barbato
    hlsenc: Fix the openssl support
    
    • [DBH] libavformat/hlsenc.c
  • arm : vp9itxfm : Skip empty slices in the first pass of idct_idct 16x16 and 32x32

    18 novembre 2016, par Martin Storsjö
    arm: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32
    
    This work is sponsored by, and copyright, Google.
    
    Previously all subpartitions except the eob=1 (DC) case ran with
    the same runtime:
    
                                         Cortex A7       A8       A9      A53
    vp9_inv_dct_dct_16x16_sub16_add_neon:   3188.1   2435.4   2499.0   1969.0
    vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.7  16582.3  14207.6  12000.3
    
    By skipping individual 4x16 or 4x32 pixel slices in the first pass,
    we reduce the runtime of these functions like this:
    
    vp9_inv_dct_dct_16x16_sub1_add_neon:     274.6    189.5    211.7    235.8
    vp9_inv_dct_dct_16x16_sub2_add_neon:    2064.0   1534.8   1719.4   1248.7
    vp9_inv_dct_dct_16x16_sub4_add_neon:    2135.0   1477.2   1736.3   1249.5
    vp9_inv_dct_dct_16x16_sub8_add_neon:    2446.7   1828.7   1993.6   1494.7
    vp9_inv_dct_dct_16x16_sub12_add_neon:   2832.4   2118.3   2266.5   1735.1
    vp9_inv_dct_dct_16x16_sub16_add_neon:   3211.7   2475.3   2523.5   1983.1
    vp9_inv_dct_dct_32x32_sub1_add_neon:     756.2    456.7    862.0    553.9
    vp9_inv_dct_dct_32x32_sub2_add_neon:   10682.2   8190.4   8539.2   6762.5
    vp9_inv_dct_dct_32x32_sub4_add_neon:   10813.5   8014.9   8518.3   6762.8
    vp9_inv_dct_dct_32x32_sub8_add_neon:   11859.6   9313.0   9347.4   7514.5
    vp9_inv_dct_dct_32x32_sub12_add_neon:  12946.6  10752.4  10192.2   8280.2
    vp9_inv_dct_dct_32x32_sub16_add_neon:  14074.6  11946.5  11001.4   9008.6
    vp9_inv_dct_dct_32x32_sub20_add_neon:  15269.9  13662.7  11816.1   9762.6
    vp9_inv_dct_dct_32x32_sub24_add_neon:  16327.9  14940.1  12626.7  10516.0
    vp9_inv_dct_dct_32x32_sub28_add_neon:  17462.7  15776.1  13446.2  11264.7
    vp9_inv_dct_dct_32x32_sub32_add_neon:  18575.5  17157.0  14249.3  12015.1
    
    I.e. in general a very minor overhead for the full subpartition case due
    to the additional loads and cmps, but a significant speedup for the cases
    when we only need to process a small part of the actual input data.
    
    In common VP9 content in a few inspected clips, 70-90% of the non-dc-only
    16x16 and 32x32 IDCTs only have nonzero coefficients in the upper left
    8x8 or 16x16 subpartitions respectively.
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/arm/vp9itxfm_neon.S
    • [DBH] tests/checkasm/vp9dsp.c
  • Revert "checkasm : vp9dsp : Benchmark the dc-only version of idct_idct separately"

    18 novembre 2016, par Martin Storsjö
    Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately"
    
    This reverts commit 81d7f0bbca837afda1f7e60d3ae52ab1360ab44b.
    
    Instead of just benchmarking dc separately, test all relevant subparts
    (in the next commit).
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] tests/checkasm/vp9dsp.c
  • arm : vp9itxfm : Simplify the stack alignment code

    18 novembre 2016, par Janne Grunau
    arm: vp9itxfm: Simplify the stack alignment code
    
    This is one instruction less for thumb, and only have got
    1/2 arm/thumb specific instructions.
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/arm/vp9itxfm_neon.S