git.videolan.org Git - ffmpeg.git/rss log
FFmpeg git repo
Les articles publiés sur le site
-
random_seed : Improve behaviour with small timer increments with high precision timers
29 janvier, par Martin Storsjörandom_seed: Improve behaviour with small timer increments with high precision timers On a Zen 5, on Ubuntu 24.04 (with CLOCKS_PER_SEC 1000000), the value of clock() in this loop increments by 0 most of the time, and when it does increment, it usually increments by 1 compared to the previous round. Due to the "last_t + 2*last_td + (CLOCKS_PER_SEC > 1000) >= t" expression, we only manage to take one step forward in this loop (incrementing i) if clock() increments by 2, while it incremented by 0 in the previous iteration (last_td). This is similar to the change done in c4152fc42e480c41efb7f761b1bbe5f0bc43d5bc, to speed it up on systems with very small CLOCKS_PER_SEC. However in this case, CLOCKS_PER_SEC is still very large, but the machine is fast enough to hit every clock increment repeatedly. For this case, use the number of repetitions of each timer value as entropy source; require a change in the number of repetitions in order to proceed to the next buffer index. This helps the fate-random-seed test to actually terminate within a reasonable time on such a system (where it previously could hang, running for many minutes). Signed-off-by: Martin Storsjö <martin@martin.st>
-
checkasm : aacencdsp : Actually test nonzero values in quant_bands
29 janvier, par Martin Storsjöcheckasm: aacencdsp: Actually test nonzero values in quant_bands Previously, we read elements from ff_aac_pow34sf_tab; however that table is initialized to zero; one needs to call ff_aac_float_common_init() to make sure that the table is initialized. However, given the range of the input values, a large number of entries in ff_aac_pow34sf_tab would give results outside of the range for signed 32 bit integers. As the largest aac_cb_maxval entry is 16, it seems more reasonable to produce values within an order of mangitude of that value. (When hitting INT_MIN, implementations may end up with different results depending on whether the value is negated as a float or as an int. This corner case is irrelevant in practice as this is way outside of the expected value range here.) Coincidentally, this fixes linking checkasm with Apple's older linker. (In Xcode 15, Apple switched to a new linker. The one in older toolchains seems to have a bug where it won't figure out to load object files from a static library, if the only symbol referenced in the object file is a "common" symbol, i.e. one for a zero-initialized variable. This issue can also be reproduced with newer Apple toolchains by passing -Wl,-ld_classic to the linker.) Signed-off-by: Martin Storsjö <martin@martin.st>
-
avformat/mov : fix overflow in drift timestamp calculation
29 janvier, par James Almeravformat/mov: fix overflow in drift timestamp calculation Fixes: signed integer overflow: 7803923888585309955 - -3407677434275325337 cannot be represented in type 'int64_t' (aka 'long') Fixes: 377736723/clusterfuzz-testcase-minimized-media_pipeline_integration_fuzzer-5052449500889088 Signed-off-by: James Almer <jamrial@gmail.com>
-
libavformat/hls : Be more restrictive on mpegts extensions
28 janvier, par Michael Niedermayer -
swscale/aarch64/rgb2rgb : Implemented NEON shuf routines
28 janvier, par Krzysztof Pyrkoszswscale/aarch64/rgb2rgb: Implemented NEON shuf routines The key idea is to pass the pre-generated tables to the TBL instruction and churn through the data 16 bytes at a time. The remaining 4 elements are handled with a specialized block located at the end of the routine. The 3210 variant can be implemented using rev32, but surprisingly it is slower than the generic TBL on A78, but much faster on A72. There may be some room for improvement. Possibly instead of handling last 8 and then 4 bytes separately, we can load these 4 into {v0.s}[2] and process along with the last 8 bytes. Speeds measured with checkasm --test=sw_rgb --bench --runs=10 | grep shuf - A78 shuffle_bytes_0321_c: 75.5 ( 1.00x) shuffle_bytes_0321_neon: 26.5 ( 2.85x) shuffle_bytes_1203_c: 136.2 ( 1.00x) shuffle_bytes_1203_neon: 27.2 ( 5.00x) shuffle_bytes_1230_c: 135.5 ( 1.00x) shuffle_bytes_1230_neon: 28.0 ( 4.84x) shuffle_bytes_2013_c: 138.8 ( 1.00x) shuffle_bytes_2013_neon: 22.0 ( 6.31x) shuffle_bytes_2103_c: 76.5 ( 1.00x) shuffle_bytes_2103_neon: 20.5 ( 3.73x) shuffle_bytes_2130_c: 137.5 ( 1.00x) shuffle_bytes_2130_neon: 28.0 ( 4.91x) shuffle_bytes_3012_c: 138.2 ( 1.00x) shuffle_bytes_3012_neon: 21.5 ( 6.43x) shuffle_bytes_3102_c: 138.2 ( 1.00x) shuffle_bytes_3102_neon: 27.2 ( 5.07x) shuffle_bytes_3210_c: 138.0 ( 1.00x) shuffle_bytes_3210_neon: 22.0 ( 6.27x) shuf3210 using rev32 shuffle_bytes_3210_c: 139.0 ( 1.00x) shuffle_bytes_3210_neon: 28.5 ( 4.88x) - A72 shuffle_bytes_0321_c: 120.0 ( 1.00x) shuffle_bytes_0321_neon: 36.0 ( 3.33x) shuffle_bytes_1203_c: 188.2 ( 1.00x) shuffle_bytes_1203_neon: 37.8 ( 4.99x) shuffle_bytes_1230_c: 195.0 ( 1.00x) shuffle_bytes_1230_neon: 36.0 ( 5.42x) shuffle_bytes_2013_c: 195.8 ( 1.00x) shuffle_bytes_2013_neon: 43.5 ( 4.50x) shuffle_bytes_2103_c: 117.2 ( 1.00x) shuffle_bytes_2103_neon: 53.5 ( 2.19x) shuffle_bytes_2130_c: 203.2 ( 1.00x) shuffle_bytes_2130_neon: 37.8 ( 5.38x) shuffle_bytes_3012_c: 183.8 ( 1.00x) shuffle_bytes_3012_neon: 46.8 ( 3.93x) shuffle_bytes_3102_c: 180.8 ( 1.00x) shuffle_bytes_3102_neon: 37.8 ( 4.79x) shuffle_bytes_3210_c: 195.8 ( 1.00x) shuffle_bytes_3210_neon: 37.8 ( 5.19x) shuf3210 using rev32 shuffle_bytes_3210_c: 194.8 ( 1.00x) shuffle_bytes_3210_neon: 30.8 ( 6.33x) - x13s: shuffle_bytes_0321_c: 49.4 ( 1.00x) shuffle_bytes_0321_neon: 18.1 ( 2.72x) shuffle_bytes_1203_c: 98.4 ( 1.00x) shuffle_bytes_1203_neon: 18.4 ( 5.35x) shuffle_bytes_1230_c: 97.4 ( 1.00x) shuffle_bytes_1230_neon: 19.1 ( 5.09x) shuffle_bytes_2013_c: 101.4 ( 1.00x) shuffle_bytes_2013_neon: 16.9 ( 6.01x) shuffle_bytes_2103_c: 53.9 ( 1.00x) shuffle_bytes_2103_neon: 13.9 ( 3.88x) shuffle_bytes_2130_c: 100.9 ( 1.00x) shuffle_bytes_2130_neon: 19.1 ( 5.27x) shuffle_bytes_3012_c: 97.4 ( 1.00x) shuffle_bytes_3012_neon: 17.1 ( 5.69x) shuffle_bytes_3102_c: 100.9 ( 1.00x) shuffle_bytes_3102_neon: 19.1 ( 5.27x) shuffle_bytes_3210_c: 100.6 ( 1.00x) shuffle_bytes_3210_neon: 16.9 ( 5.96x) shuf3210 using rev32 shuffle_bytes_3210_c: 100.6 ( 1.00x) shuffle_bytes_3210_neon: 18.6 ( 5.40x) Signed-off-by: Martin Storsjö <martin@martin.st>