git.videolan.org Git - ffmpeg.git/rss log

FFmpeg git repo

http://git.videolan.org/?p=ffmpeg.git;a=summary

Les articles publiés sur le site

  • random_seed : Improve behaviour with small timer increments with high precision timers

    29 janvier, par Martin Storsjö
    random_seed: Improve behaviour with small timer increments with high precision timers
    
    On a Zen 5, on Ubuntu 24.04 (with CLOCKS_PER_SEC 1000000), the
    value of clock() in this loop increments by 0 most of the time,
    and when it does increment, it usually increments by 1 compared
    to the previous round.
    
    Due to the "last_t + 2*last_td + (CLOCKS_PER_SEC > 1000) >= t"
    expression, we only manage to take one step forward in this loop
    (incrementing i) if clock() increments by 2, while it incremented
    by 0 in the previous iteration (last_td).
    
    This is similar to the change done in
    c4152fc42e480c41efb7f761b1bbe5f0bc43d5bc, to speed it up on
    systems with very small CLOCKS_PER_SEC. However in this case,
    CLOCKS_PER_SEC is still very large, but the machine is fast enough
    to hit every clock increment repeatedly.
    
    For this case, use the number of repetitions of each timer value
    as entropy source; require a change in the number of repetitions
    in order to proceed to the next buffer index.
    
    This helps the fate-random-seed test to actually terminate within
    a reasonable time on such a system (where it previously could hang,
    running for many minutes).
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DH] libavutil/random_seed.c
  • checkasm : aacencdsp : Actually test nonzero values in quant_bands

    29 janvier, par Martin Storsjö
    checkasm: aacencdsp: Actually test nonzero values in quant_bands
    
    Previously, we read elements from ff_aac_pow34sf_tab; however
    that table is initialized to zero; one needs to call
    ff_aac_float_common_init() to make sure that the table is
    initialized.
    
    However, given the range of the input values, a large number of
    entries in ff_aac_pow34sf_tab would give results outside of the
    range for signed 32 bit integers. As the largest aac_cb_maxval
    entry is 16, it seems more reasonable to produce values within
    an order of mangitude of that value.
    
    (When hitting INT_MIN, implementations may end up with different
    results depending on whether the value is negated as a float or
    as an int. This corner case is irrelevant in practice as this
    is way outside of the expected value range here.)
    
    Coincidentally, this fixes linking checkasm with Apple's older
    linker. (In Xcode 15, Apple switched to a new linker. The one in
    older toolchains seems to have a bug where it won't figure out to
    load object files from a static library, if the only symbol
    referenced in the object file is a "common" symbol, i.e. one for
    a zero-initialized variable. This issue can also be reproduced with
    newer Apple toolchains by passing -Wl,-ld_classic to the linker.)
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DH] tests/checkasm/aacencdsp.c
  • avformat/mov : fix overflow in drift timestamp calculation

    29 janvier, par James Almer
    avformat/mov: fix overflow in drift timestamp calculation
    
    Fixes: signed integer overflow: 7803923888585309955 - -3407677434275325337 cannot be represented in type 'int64_t' (aka 'long')
    Fixes: 377736723/clusterfuzz-testcase-minimized-media_pipeline_integration_fuzzer-5052449500889088
    
    Signed-off-by: James Almer <jamrial@gmail.com>
    
    • [DH] libavformat/mov.c
  • libavformat/hls : Be more restrictive on mpegts extensions

    28 janvier, par Michael Niedermayer
    libavformat/hls: Be more restrictive on mpegts extensions
    
    Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
    
    • [DH] libavformat/hls.c
  • swscale/aarch64/rgb2rgb : Implemented NEON shuf routines

    28 janvier, par Krzysztof Pyrkosz
    swscale/aarch64/rgb2rgb: Implemented NEON shuf routines
    
    The key idea is to pass the pre-generated tables to the TBL instruction
    and churn through the data 16 bytes at a time. The remaining 4 elements
    are handled with a specialized block located at the end of the routine.
    
    The 3210 variant can be implemented using rev32, but surprisingly it is
    slower than the generic TBL on A78, but much faster on A72.
    
    There may be some room for improvement. Possibly instead of handling
    last 8 and then 4 bytes separately, we can load these 4 into {v0.s}[2]
    and process along with the last 8 bytes.
    
    Speeds measured with checkasm --test=sw_rgb --bench --runs=10 | grep shuf
    
    - A78
    shuffle_bytes_0321_c:                                   75.5 ( 1.00x)
    shuffle_bytes_0321_neon:                                26.5 ( 2.85x)
    shuffle_bytes_1203_c:                                  136.2 ( 1.00x)
    shuffle_bytes_1203_neon:                                27.2 ( 5.00x)
    shuffle_bytes_1230_c:                                  135.5 ( 1.00x)
    shuffle_bytes_1230_neon:                                28.0 ( 4.84x)
    shuffle_bytes_2013_c:                                  138.8 ( 1.00x)
    shuffle_bytes_2013_neon:                                22.0 ( 6.31x)
    shuffle_bytes_2103_c:                                   76.5 ( 1.00x)
    shuffle_bytes_2103_neon:                                20.5 ( 3.73x)
    shuffle_bytes_2130_c:                                  137.5 ( 1.00x)
    shuffle_bytes_2130_neon:                                28.0 ( 4.91x)
    shuffle_bytes_3012_c:                                  138.2 ( 1.00x)
    shuffle_bytes_3012_neon:                                21.5 ( 6.43x)
    shuffle_bytes_3102_c:                                  138.2 ( 1.00x)
    shuffle_bytes_3102_neon:                                27.2 ( 5.07x)
    shuffle_bytes_3210_c:                                  138.0 ( 1.00x)
    shuffle_bytes_3210_neon:                                22.0 ( 6.27x)
    
    shuf3210 using rev32
    shuffle_bytes_3210_c:                                  139.0 ( 1.00x)
    shuffle_bytes_3210_neon:                                28.5 ( 4.88x)
    
    - A72
    shuffle_bytes_0321_c:                                  120.0 ( 1.00x)
    shuffle_bytes_0321_neon:                                36.0 ( 3.33x)
    shuffle_bytes_1203_c:                                  188.2 ( 1.00x)
    shuffle_bytes_1203_neon:                                37.8 ( 4.99x)
    shuffle_bytes_1230_c:                                  195.0 ( 1.00x)
    shuffle_bytes_1230_neon:                                36.0 ( 5.42x)
    shuffle_bytes_2013_c:                                  195.8 ( 1.00x)
    shuffle_bytes_2013_neon:                                43.5 ( 4.50x)
    shuffle_bytes_2103_c:                                  117.2 ( 1.00x)
    shuffle_bytes_2103_neon:                                53.5 ( 2.19x)
    shuffle_bytes_2130_c:                                  203.2 ( 1.00x)
    shuffle_bytes_2130_neon:                                37.8 ( 5.38x)
    shuffle_bytes_3012_c:                                  183.8 ( 1.00x)
    shuffle_bytes_3012_neon:                                46.8 ( 3.93x)
    shuffle_bytes_3102_c:                                  180.8 ( 1.00x)
    shuffle_bytes_3102_neon:                                37.8 ( 4.79x)
    shuffle_bytes_3210_c:                                  195.8 ( 1.00x)
    shuffle_bytes_3210_neon:                                37.8 ( 5.19x)
    
    shuf3210 using rev32
    shuffle_bytes_3210_c:                                  194.8 ( 1.00x)
    shuffle_bytes_3210_neon:                                30.8 ( 6.33x)
    
    - x13s:
    shuffle_bytes_0321_c:                                   49.4 ( 1.00x)
    shuffle_bytes_0321_neon:                                18.1 ( 2.72x)
    shuffle_bytes_1203_c:                                   98.4 ( 1.00x)
    shuffle_bytes_1203_neon:                                18.4 ( 5.35x)
    shuffle_bytes_1230_c:                                   97.4 ( 1.00x)
    shuffle_bytes_1230_neon:                                19.1 ( 5.09x)
    shuffle_bytes_2013_c:                                  101.4 ( 1.00x)
    shuffle_bytes_2013_neon:                                16.9 ( 6.01x)
    shuffle_bytes_2103_c:                                   53.9 ( 1.00x)
    shuffle_bytes_2103_neon:                                13.9 ( 3.88x)
    shuffle_bytes_2130_c:                                  100.9 ( 1.00x)
    shuffle_bytes_2130_neon:                                19.1 ( 5.27x)
    shuffle_bytes_3012_c:                                   97.4 ( 1.00x)
    shuffle_bytes_3012_neon:                                17.1 ( 5.69x)
    shuffle_bytes_3102_c:                                  100.9 ( 1.00x)
    shuffle_bytes_3102_neon:                                19.1 ( 5.27x)
    shuffle_bytes_3210_c:                                  100.6 ( 1.00x)
    shuffle_bytes_3210_neon:                                16.9 ( 5.96x)
    
    shuf3210 using rev32
    shuffle_bytes_3210_c:                                  100.6 ( 1.00x)
    shuffle_bytes_3210_neon:                                18.6 ( 5.40x)
    
    Signed-off-by: Martin Storsjö <martin@martin.st>
    
    • [DH] libswscale/aarch64/rgb2rgb.c
    • [DH] libswscale/aarch64/rgb2rgb_neon.S
    • [DH] tests/checkasm/sw_rgb.c