git.libav.org Git - libav.git/rss log

Libav master git repository

http://git.libav.org/?p=libav.git;a=summary

Les articles publiés sur le site

  • aarch64 : vp9itxfm : Fix incorrect vertical alignment

    3 janvier 2017, par Martin Storsjö
    aarch64: vp9itxfm: Fix incorrect vertical alignment
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp9itxfm_neon.S
  • aarch64 : vp9itxfm : Use a single lane ld1 instead of ld1r where possible

    3 janvier 2017, par Martin Storsjö
    aarch64: vp9itxfm: Use a single lane ld1 instead of ld1r where possible
    
    The ld1r is a leftover from the arm version, where this trick is
    beneficial on some cores.
    
    Use a single-lane load where we don't need the semantics of ld1r.
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp9itxfm_neon.S
  • arm : vp9itxfm : Avoid reloading the idct32 coefficients

    2 janvier 2017, par Martin Storsjö
    arm: vp9itxfm: Avoid reloading the idct32 coefficients
    
    The idct32x32 function actually pushed q4-q7 onto the stack even
    though it didn't clobber them; there are plenty of registers that
    can be used to allow keeping all the idct coefficients in registers
    without having to reload different subsets of them at different
    stages in the transform.
    
    Since the idct16 core transform avoids clobbering q4-q7 (but clobbers
    q2-q3 instead, to avoid needing to back up and restore q4-q7 at all
    in the idct16 function), and the lanewise vmul needs a register in
    the q0-q3 range, we move the stored coefficients from q2-q3 into q4-q5
    while doing idct16.
    
    While keeping these coefficients in registers, we still can skip pushing
    q7.
    
    Before:                              Cortex A7       A8       A9      A53
    vp9_inv_dct_dct_32x32_sub32_add_neon:  18553.8  17182.7  14303.3  12089.7
    After:
    vp9_inv_dct_dct_32x32_sub32_add_neon:  18470.3  16717.7  14173.6  11860.8
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/arm/vp9itxfm_neon.S
  • aarch64 : vp9itxfm : Avoid reloading the idct32 coefficients

    2 janvier 2017, par Martin Storsjö
    aarch64: vp9itxfm: Avoid reloading the idct32 coefficients
    
    The idct32x32 function actually pushed d8-d15 onto the stack even
    though it didn't clobber them; there are plenty of registers that
    can be used to allow keeping all the idct coefficients in registers
    without having to reload different subsets of them at different
    stages in the transform.
    
    After this, we still can skip pushing d12-d15.
    
    Before:
    vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
    After:
    vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DBH] libavcodec/aarch64/vp9itxfm_neon.S
  • cmdutils : update copyright year to 2017

    1er janvier 2017, par Sean McGovern
    cmdutils: update copyright year to 2017
    
    CC: libav-stable@libav.org
    
    • [DBH] cmdutils.c