
Recherche avancée
Médias (1)
-
Sintel MP4 Surround 5.1 Full
13 mai 2011, par
Mis à jour : Février 2012
Langue : English
Type : Video
Autres articles (75)
-
Les tâches Cron régulières de la ferme
1er décembre 2010, parLa gestion de la ferme passe par l’exécution à intervalle régulier de plusieurs tâches répétitives dites Cron.
Le super Cron (gestion_mutu_super_cron)
Cette tâche, planifiée chaque minute, a pour simple effet d’appeler le Cron de l’ensemble des instances de la mutualisation régulièrement. Couplée avec un Cron système sur le site central de la mutualisation, cela permet de simplement générer des visites régulières sur les différents sites et éviter que les tâches des sites peu visités soient trop (...) -
Publier sur MédiaSpip
13 juin 2013Puis-je poster des contenus à partir d’une tablette Ipad ?
Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir -
Des sites réalisés avec MediaSPIP
2 mai 2011, parCette page présente quelques-uns des sites fonctionnant sous MediaSPIP.
Vous pouvez bien entendu ajouter le votre grâce au formulaire en bas de page.
Sur d’autres sites (11828)
-
Beware the builtins
14 janvier 2010, par Mans — CompilersGCC includes a large number of builtin functions allegedly providing optimised code for common operations not easily expressed directly in C. Rather than taking such claims at face value (this is GCC after all), I decided to conduct a small investigation to see how well a few of these functions are actually implemented for various targets.
For my test, I selected the following functions :
__builtin_bswap32
: Byte-swap a 32-bit word.__builtin_bswap64
: Byte-swap a 64-bit word.__builtin_clz
: Count leading zeros in a word.__builtin_ctz
: Count trailing zeros in a word.__builtin_prefetch
: Prefetch data into cache.
To test the quality of these builtins, I wrapped each in a normal function, then compiled the code for these targets :
- ARMv7
- AVR32
- MIPS
- MIPS64
- PowerPC
- PowerPC64
- x86
- x86_64
In all cases I used compiler flags were
-O3 -fomit-frame-pointer
plus any flags required to select a modern CPU model.
ARM
Both
__builtin_clz
and__builtin_prefetch
generate the expectedCLZ
andPLD
instructions respectively. The code for__builtin_ctz
is reasonable for ARMv6 and earlier :rsb r3, r0, #0 and r0, r3, r0 clz r0, r0 rsb r0, r0, #31
For ARMv7 (in fact v6T2), however, using the new bit-reversal instruction would have been better :
rbit r0, r0 clz r0, r0
I suspect this is simply a matter of the function not yet having been updated for ARMv7, which is perhaps even excusable given the relatively rare use cases for it.
The byte-reversal functions are where it gets shocking. Rather than use the
REV
instruction found from ARMv6 on, both of them generate external calls to__bswapsi2
and__bswapdi2
in libgcc, which is plain C code :SItype __bswapsi2 (SItype u) return ((((u) & 0xff000000) >> 24) | (((u) & 0x00ff0000) >> 8) | (((u) & 0x0000ff00) << 8) | (((u) & 0x000000ff) << 24)) ;
DItype
__bswapdi2 (DItype u)
return ((((u) & 0xff00000000000000ull) >> 56)
| (((u) & 0x00ff000000000000ull) >> 40)
| (((u) & 0x0000ff0000000000ull) >> 24)
| (((u) & 0x000000ff00000000ull) >> 8)
| (((u) & 0x00000000ff000000ull) << 8)
| (((u) & 0x0000000000ff0000ull) << 24)
| (((u) & 0x000000000000ff00ull) << 40)
| (((u) & 0x00000000000000ffull) << 56)) ;
While the 32-bit version compiles to a reasonable-looking shift/mask/or job, the 64-bit one is a real WTF. Brace yourselves :
push r4, r5, r6, r7, r8, r9, sl, fp mov r5, #0 mov r6, #65280 ; 0xff00 sub sp, sp, #40 ; 0x28 and r7, r0, r5 and r8, r1, r6 str r7, [sp, #8] str r8, [sp, #12] mov r9, #0 mov r4, r1 and r5, r0, r9 mov sl, #255 ; 0xff ldr r9, [sp, #8] and r6, r4, sl mov ip, #16711680 ; 0xff0000 str r5, [sp, #16] str r6, [sp, #20] lsl r2, r0, #24 and ip, ip, r1 lsr r7, r4, #24 mov r1, #0 lsr r5, r9, #24 mov sl, #0 mov r9, #-16777216 ; 0xff000000 and fp, r0, r9 lsr r6, ip, #8 orr r9, r7, r1 and ip, r4, sl orr sl, r1, r2 str r6, [sp] str r9, [sp, #32] str sl, [sp, #36] ; 0x24 add r8, sp, #32 ldm r8, r7, r8 str r1, [sp, #4] ldm sp, r9, sl orr r7, r7, r9 orr r8, r8, sl str r7, [sp, #32] str r8, [sp, #36] ; 0x24 mov r3, r0 mov r7, #16711680 ; 0xff0000 mov r8, #0 and r9, r3, r7 and sl, r4, r8 ldr r0, [sp, #16] str fp, [sp, #24] str ip, [sp, #28] stm sp, r9, sl ldr r7, [sp, #20] ldr sl, [sp, #12] ldr fp, [sp, #12] ldr r8, [sp, #28] lsr r0, r0, #8 orr r7, r0, r7, lsl #24 lsr r6, sl, #24 orr r5, r5, fp, lsl #8 lsl sl, r8, #8 mov fp, r7 add r8, sp, #32 ldm r8, r7, r8 orr r6, r6, r8 ldr r8, [sp, #20] ldr r0, [sp, #24] orr r5, r5, r7 lsr r8, r8, #8 orr sl, sl, r0, lsr #24 mov ip, r8 ldr r0, [sp, #4] orr fp, fp, r5 ldr r5, [sp, #24] orr ip, ip, r6 ldr r6, [sp] lsl r9, r5, #8 lsl r8, r0, #24 orr fp, fp, r9 lsl r3, r3, #8 orr r8, r8, r6, lsr #8 orr ip, ip, sl lsl r7, r6, #24 and r5, r3, #16711680 ; 0xff0000 orr r7, r7, fp orr r8, r8, ip orr r4, r1, r7 orr r5, r5, r8 mov r9, r6 mov r1, r5 mov r0, r4 add sp, sp, #40 ; 0x28 pop r4, r5, r6, r7, r8, r9, sl, fp bx lr
That’s right, 91 instructions to move 8 bytes around a bit. GCC definitely has a problem with 64-bit numbers. It is perhaps worth noting that the
bswap_64
macro in glibc splits the 64-bit value into 32-bit halves which are then reversed independently, thus side-stepping this weakness of gcc.As a side note, ARM RVCT (armcc) compiles those functions perfectly into one and two
REV
instructions, respectively.AVR32
There is not much to report here. The latest gcc version available is 4.2.4, which doesn’t appear to have the bswap functions. The other three are handled nicely, even using a bit-reverse for
__builtin_ctz
.MIPS / MIPS64
The situation MIPS is similar to ARM. Both bswap builtins result in external libgcc calls, the rest giving sensible code.
PowerPC
I scarcely believe my eyes, but this one is actually not bad. The PowerPC has no byte-reversal instructions, yet someone seems to have taken the time to teach gcc a good instruction sequence for this operation. The PowerPC does have some powerful rotate-and-mask instructions which come in handy here. First the 32-bit version :
rotlwi r0,r3,8 rlwimi r0,r3,24,0,7 rlwimi r0,r3,24,16,23 mr r3,r0 blr
The 64-bit byte-reversal simply applies the above code on each half of the value :
rotlwi r0,r3,8 rlwimi r0,r3,24,0,7 rlwimi r0,r3,24,16,23 rotlwi r3,r4,8 rlwimi r3,r4,24,0,7 rlwimi r3,r4,24,16,23 mr r4,r0 blr
Although I haven’t analysed that code carefully, it looks pretty good.
PowerPC64
Doing 64-bit operations is easier on a 64-bit CPU, right ? For you and me perhaps, but not for gcc. Here
__builtin_bswap64
gives us the now familiar__bswapdi2
call, and while not as bad as the ARM version, it is not pretty :rldicr r0,r3,8,55 rldicr r10,r3,56,7 rldicr r0,r0,56,15 rldicl r11,r3,8,56 rldicr r9,r3,16,47 or r11,r10,r11 rldicr r9,r9,48,23 rldicl r10,r0,24,40 rldicr r0,r3,24,39 or r11,r11,r10 rldicl r9,r9,40,24 rldicr r0,r0,40,31 or r9,r11,r9 rlwinm r10,r3,0,0,7 rldicl r0,r0,56,8 or r0,r9,r0 rldicr r10,r10,8,55 rlwinm r11,r3,0,8,15 or r0,r0,r10 rldicr r11,r11,24,39 rlwinm r3,r3,0,16,23 or r0,r0,r11 rldicr r3,r3,40,23 or r3,r0,r3 blr
That is 6 times longer than the (presumably) hand-written 32-bit version.
x86 / x86_64
As one might expect, results on x86 are good. All the tested functions use the available special instructions. One word of caution though : the bit-counting instructions are very slow on some implementations, specifically the Atom, AMD chips, and the notoriously slow Pentium4E.
Conclusion
In conclusion, I would say gcc builtins can be useful to avoid fragile inline assembler. Before using them, however, one should make sure they are not in fact harmful on the required targets. Not even those builtins mapping directly to CPU instructions can be trusted.
-
FFmpeg, how to skip late input ?
14 novembre 2017, par user3343357I’m running ffmpeg to display incoming stream on a Decklink BlackMagic card with the following command line :
ffmpeg -y -f ourFmt -probesize 32 -i - -f decklink -preset ultrafast
-pix_fmt uyvy422 -s 1920x1080 -r 30 -af volume=0.1 -max_delay 10000
DeckLink Mini MonitorBasically I get the video over the internet by UDP and stream it to ffmpeg stdin. Both audio and video streams have pts and dts and are fully in sync, if the connection is good there is no problems.
However if there are issues with the connection i start getting errors, sometimes the video delay grows significantly, and audio stops working.
The errors i get are :ffmpeg : [decklink @ 0x26cc600] There are not enough buffered video
frames. Video may misbehave ! ffmpeg : [decklink @ 0x26cc600] There’s no
buffered audio. Audio will misbehave ! ffmpeg : Last message
repeated 4 times ffmpeg : [decklink @ 0x26cc600] There are not enough
buffered video frames. Video may misbehave ! ffmpeg : [decklink @
0x26cc600] There’s no buffered audio. Audio will misbehave ! ffmpeg :
Last message repeated 3 times ffmpeg : frame= 5204 fps= 30 q=-0.0
size=N/A time=00:02:53.76 bitrate=N/A dup=385 drop=5 speed=0.993x
ffmpeg : [decklink @ 0x26cc600] There’s no buffered audio. Audio will
misbehave ! ffmpeg : Last message repeated 18 times ffmpeg :
[decklink @ 0x26cc600] There are not enough buffered video frames.
Video may misbehave ! ffmpeg : [decklink @ 0x26cc600] There’s no
buffered audio. Audio will misbehave !The problem is when the connection is back to normal, the video keeps misbehaving until I restart the stream. What I want to do is for FFmpeg to skip to the content of the last second and play synchronized video from there, drop all the late data in between, is it possible ?
-
FFMPEG Seeking with concat demuxer causes video & audio to be out of sync
20 février 2023, par GaruukI have a very simple use case that's driving me bananas.


My problem and question :


I'm using ffmpeg version 5.1.2 on a MacOS and i'm using ffmpeg seeking and concat demuxer to cut many 1 minute videos into 15 seconds chopped up over 12 clips where every clip is just 2 seconds from the same video (kind of like a mini teasers for the video). I would really like to not have to re-encode to make the video processing as fast as possible.


First, I take each 1 minute video and cut it up into 12 clips (I do all this programmatically in python fwiw)


ffmpeg -ss 0 -i input.mp4 -t 2 -c copy -y cut_1.mp4
ffmpeg -ss 4 -i input.mp4 -t 2 -c copy -y cut_2.mp4
ffmpeg -ss 8 -i input.mp4 -t 2 -c copy -y cut_3.mp4
...
...



I then write all the output file names to my
concat_manifest.txt


file cut_1.mp4
file cut_2.mp4
...
...



Then I run my concat command :


ffmpeg -f concat -i concat_manifest.txt -c copy -y concat_video.mp4



This works really fast but the audio and video at the stitch point get out of sync and sometimes the video just chokes & lags. It's mostly not a smooth experience.


What I have tried :


- 

- using the concat protocol with intermediate profiles : ffmpeg.org/wiki/Concatenate#demuxer
- Putting the -ss when I seek after the -i. This makes everything worse
- Playing around with different -ss values. This has some noticeable affects but it's not obvious why yet.
- I've also read from the ffmpeg resource regarding seeking and copying :










Which leads me to believe that maybe because ffmpeg is using timestamps instead of frames, seeking isn't accurate using -ss when using the concat demuxer


Is there a way to get concat demuxer cutting and concatenating the video where the audio is somewhat in sync with the video ?


Thanks


EDIT : I found an answer and i'll be posting the solution in the coming few days.