
Recherche avancée
Médias (1)
-
The Great Big Beautiful Tomorrow
28 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Texte
Autres articles (94)
-
Le profil des utilisateurs
12 avril 2011, parChaque utilisateur dispose d’une page de profil lui permettant de modifier ses informations personnelle. Dans le menu de haut de page par défaut, un élément de menu est automatiquement créé à l’initialisation de MediaSPIP, visible uniquement si le visiteur est identifié sur le site.
L’utilisateur a accès à la modification de profil depuis sa page auteur, un lien dans la navigation "Modifier votre profil" est (...) -
HTML5 audio and video support
13 avril 2011, parMediaSPIP uses HTML5 video and audio tags to play multimedia files, taking advantage of the latest W3C innovations supported by modern browsers.
The MediaSPIP player used has been created specifically for MediaSPIP and can be easily adapted to fit in with a specific theme.
For older browsers the Flowplayer flash fallback is used.
MediaSPIP allows for media playback on major mobile platforms with the above (...) -
MediaSPIP : Modification des droits de création d’objets et de publication définitive
11 novembre 2010, parPar défaut, MediaSPIP permet de créer 5 types d’objets.
Toujours par défaut les droits de création et de publication définitive de ces objets sont réservés aux administrateurs, mais ils sont bien entendu configurables par les webmestres.
Ces droits sont ainsi bloqués pour plusieurs raisons : parce que le fait d’autoriser à publier doit être la volonté du webmestre pas de l’ensemble de la plateforme et donc ne pas être un choix par défaut ; parce qu’avoir un compte peut servir à autre choses également, (...)
Sur d’autres sites (9236)
-
VP8 : a retrospective
I’ve been working the past few weeks to help finish up the ffmpeg VP8 decoder, the first community implementation of On2′s VP8 video format. Now that I’ve written a thousand or two lines of assembly code and optimized a good bit of the C code, I’d like to look back at VP8 and comment on a variety of things — both good and bad — that slipped the net the first time, along with things that have changed since the time of that blog post.
These are less-so issues related to compression — that issue has been beaten to death, particularly in MSU’s recent comparison, where x264 beat the crap out of VP8 and the VP8 developers pulled a Pinocchio in the developer comments. But that was expected and isn’t particularly interesting, so I won’t go into that. VP8 doesn’t have to be the best in the world in order to be useful.
When the ffmpeg VP8 decoder is complete (just a few more asm functions to go), we’ll hopefully be able to post some benchmarks comparing it to libvpx.
1. The spec, er, I mean, bitstream guide.
Google has reneged on their claim that a spec existed at all and renamed it a “bitstream guide”. This is probably after it was found that — not merely was it incomplete — but at least a dozen places in the spec differed wildly from what was actually in their own encoder and decoder software ! The deblocking filter, motion vector clamping, probability tables, and many more parts simply disagreed flat-out with the spec. Fortunately, Ronald Bultje, one of the main authors of the ffmpeg VP8 decoder, is rather skilled at reverse-engineering, so we were able to put together a matching implementation regardless.
Most of the differences aren’t particularly important — they don’t have a huge effect on compression or anything — but make it vastly more difficult to implement a “working” VP8 decoder, or for that matter, decide what “working” really is. For example, Google’s decoder will, if told to “swap the ALT and GOLDEN reference frames”, overwrite both with GOLDEN, because it first sets GOLDEN = ALT, and then sets ALT = GOLDEN. Is this a bug ? Or is this how it’s supposed to work ? It’s hard to tell — there isn’t a spec to say so. Google says that whatever libvpx does is right, but I doubt they intended this.
I expect a spec will eventually be written, but it was a bit obnoxious of Google — both to the community and to their own developers — to release so early that they didn’t even have their own documentation ready.
2. The TM intra prediction mode.
One thing I glossed over in the original piece was that On2 had added an extra intra prediction mode to the standard batch that H.264 came with — they replaced Planar with “TM pred”. For i4x4, which didn’t have a Planar mode, they just added it without replacing an old one, resulting in a total of 10 modes to H.264′s 9. After understanding and writing assembly code for TM pred, I have to say that it is quite a cool idea. Here’s how it works :
1. Let us take a block of size 4×4, 8×8, or 16×16.
2. Define the pixels bordering the top of this block (starting from the left) as T[0], T[1], T[2]…
3. Define the pixels bordering the left of this block (starting from the top) as L[0], L[1], L[2]…
4. Define the pixel above the top-left of the block as TL.
5. Predict every pixel <X,Y> in the block to be equal to clip3( T[X] + L[Y] – TL, 0, 255).
It’s effectively a generalization of gradient prediction to the block level — predict each pixel based on the gradient between its top and left pixels, and the topleft. According to the VP8 devs, it’s chosen by the encoder quite a lot of the time, which isn’t surprising ; it seems like a pretty good idea. As just one more intra pred mode, it’s not going to do magic for compression, but it’s a cool idea and elegantly simple.
3. Performance and the deblocking filter.
On2 advertised for quite some that VP8′s goal was to be significantly faster to decode than H.264. When I saw the spec, I waited for the punchline, but apparently they were serious. There’s nothing wrong with being of similar speed or a bit slower — but I was rather confused as to the fact that their design didn’t match their stated goal at all. What apparently happened is they had multiple profiles of VP8 — high and low complexity profiles. They marketed the performance of the low complexity ones while touting the quality of the high complexity ones, a tad dishonest. More importantly though, practically nobody is using the low complexity modes, so anyone writing a decoder has to be prepared to handle the high complexity ones, which are the default.
The primary time-eater here is the deblocking filter. VP8, being an H.264 derivative, has much the same problem as H.264 does in terms of deblocking — it spends an absurd amount of time there. As I write this post, we’re about to finish some of the deblocking filter asm code, but before it’s committed, up to 70% or more of total decoding time is spent in the deblocking filter ! Like H.264, it suffers from the 4×4 transform problem : a 4×4 transform requires a total of 8 length-16 and 8 length-8 loopfilter calls per macroblock, while Theora, with only an 8×8 transform, requires half that.
This problem is aggravated in VP8 by the fact that the deblocking filter isn’t strength-adaptive ; if even one 4×4 block in a macroblock contains coefficients, every single edge has to be deblocked. Furthermore, the deblocking filter itself is quite complicated ; the “inner edge” filter is a bit more complex than H.264′s and the “macroblock edge” filter is vastly more complicated, having two entirely different codepaths chosen on a per-pixel basis. Of course, in SIMD, this means you have to do both and mask them together at the end.
There’s nothing wrong with a good-but-slow deblocking filter. But given the amount of deblocking one needs to do in a 4×4-transform-based format, it might have been a better choice to make the filter simpler. It’s pretty difficult to beat H.264 on compression, but it’s certainly not hard to beat it on speed — and yet it seems VP8 missed a perfectly good chance to do so. Another option would have been to pick an 8×8 transform instead of 4×4, reducing the amount of deblocking by a factor of 2.
And yes, there’s a simple filter available in the low complexity profile, but it doesn’t help if nobody uses it.
4. Tree-based arithmetic coding.
Binary arithmetic coding has become the standard entropy coding method for a wide variety of compressed formats, ranging from LZMA to VP6, H.264 and VP8. It’s simple, relatively fast compared to other arithmetic coding schemes, and easy to make adaptive. The problem with this is that you have to come up with a method for converting non-binary symbols into a list of binary symbols, and then choosing what probabilities to use to code each one. Here’s an example from H.264, the sub-partition mode symbol, which is either 8×8, 8×4, 4×8, or 4×4. encode_decision( context, bit ) writes a binary decision (bit) into a numbered context (context).
8×8 : encode_decision( 21, 0 ) ;
8×4 : encode_decision( 21, 1 ) ; encode_decision( 22, 0 ) ;
4×8 : encode_decision( 21, 1 ) ; encode_decision( 22, 1 ) ; encode_decision( 23, 1 ) ;
4×4 : encode_decision( 21, 1 ) ; encode_decision( 22, 1 ) ; encode_decision( 23, 0 ) ;
As can be seen, this is clearly like a Huffman tree. Wouldn’t it be nice if we could represent this in the form of an actual tree data structure instead of code ? On2 thought so — they designed a simple system in VP8 that allowed all binarization schemes in the entire format to be represented as simple tree data structures. This greatly reduces the complexity — not speed-wise, but implementation-wise — of the entropy coder. Personally, I quite like it.
5. The inverse transform ordering.
I should at some point write a post about common mistakes made in video formats that everyone keeps making. These are not issues that are patent worries or huge issues for compression — just stupid mistakes that are repeatedly made in new video formats, probably because someone just never asked the guy next to him “does this look stupid ?” before sticking it in the spec.
One common mistake is the problem of transform ordering. Every sane 2D transform is “separable” — that is, it can be done by doing a 1D transform vertically and doing the 1D transform again horizontally (or vice versa). The original iDCT as used in JPEG, H.263, and MPEG-1/2/4 was an “idealized” iDCT — nobody had to use the exact same iDCT, theirs just had to give very close results to a reference implementation. This ended up resulting in a lot of practical problems. It was also slow ; the only way to get an accurate enough iDCT was to do all the intermediate math in 32-bit.
Practically every modern format, accordingly, has specified an exact iDCT. This includes H.264, VC-1, RV40, Theora, VP8, and many more. Of course, with an exact iDCT comes an exact ordering — while the “real” iDCT can be done in any order, an exact iDCT usually requires an exact order. That is, it specifies horizontal and then vertical, or vertical and then horizontal.
All of these transforms end up being implemented in SIMD. In SIMD, a vertical transform is generally the only option, so a transpose is added to the process instead of doing a horizontal transform. Accordingly, there are two ways to do it :
1. Transpose, vertical transform, transpose, vertical transform.
2. Vertical transform, transpose, vertical transform, transpose.
These may seem to be equally good, but there’s one catch — if the transpose is done first, it can be completely eliminated by merging it into the coefficient decoding process. On many modern CPUs, particularly x86, transposes are very expensive, so eliminating one of the two gives a pretty significant speed benefit.
H.264 did it way 1).
VC-1 did it way 1).
Theora (inherited from VP3) did it way 1).
But no. VP8 has to do it way 2), where you can’t eliminate the transpose. Bah. It’s not a huge deal ; probably only 1-2% overall at most speed-wise, but it’s just a needless waste. What really bugs me is that VP3 got it right — why in the world did they screw it up this time around if they got it right beforehand ?
RV40 is the other modern format I know that made this mistake.
(NB : You can do transforms without a transpose, but it’s generally not worth it unless the intermediate needs 32-bit math, as in the case of the “real” iDCT.)
6. Not supporting interlacing.
THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU THANK YOU.
Interlacing was the scourge of H.264. It weaseled its way into every nook and cranny of the spec, making every decoder a thousand lines longer. H.264 even included a highly complicated — and effective — dedicated interlaced coding scheme, MBAFF. The mere existence of MBAFF, despite its usefulness for broadcasters and others still stuck in the analog age with their 1080i, 576i , and 480i content, was a blight upon the video format.
VP8 has once and for all avoided it.
And if anyone suggests adding interlaced support to the experimental VP8 branch, find a straightjacket and padded cell for them before they cause any real damage.
-
Announcing the world’s fastest VP8 decoder : ffvp8
Back when I originally reviewed VP8, I noted that the official decoder, libvpx, was rather slow. While there was no particular reason that it should be much faster than a good H.264 decoder, it shouldn’t have been that much slower either ! So, I set out with Ronald Bultje and David Conrad to make a better one in FFmpeg. This one would be community-developed and free from the beginning, rather than the proprietary code-dump that was libvpx. A few weeks ago the decoder was complete enough to be bit-exact with libvpx, making it the first independent free implementation of a VP8 decoder. Now, with the first round of optimizations complete, it should be ready for primetime. I’ll go into some detail about the development process, but first, let’s get to the real meat of this post : the benchmarks.
We tested on two 1080p clips : Parkjoy, a live-action 1080p clip, and the Sintel trailer, a CGI 1080p clip. Testing was done using “time ffmpeg -vcodec libvpx or vp8 -i input -vsync 0 -an -f null -”. We all used the latest SVN FFmpeg at the time of this posting ; the last revision optimizing the VP8 decoder was r24471.
As these benchmarks show, ffvp8 is clearly much faster than libvpx, particularly on 64-bit. It’s even faster by a large margin on Atom, despite the fact that we haven’t even begun optimizing for it. In many cases, ffvp8′s extra speed can make the difference between a video that plays and one that doesn’t, especially in modern browsers with software compositing engines taking up a lot of CPU time. Want to get faster playback of VP8 videos ? The next versions of FFmpeg-based players, like VLC, will include ffvp8. Want to get faster playback of WebM in your browser ? Lobby your browser developers to use ffvp8 instead of libvpx. I expect Chrome to switch first, as they already use libavcodec for most of their playback system.
Keep in mind ffvp8 is not “done” — we will continue to improve it and make it faster. We still have a number of optimizations in the pipeline that aren’t committed yet.
Developing ffvp8
The initial challenge, primarily pioneered by David and Ronald, was constructing the core decoder and making it bit-exact to libvpx. This was rather challenging, especially given the lack of a real spec. Many parts of the spec were outright misleading and contradicted libvpx itself. It didn’t help that the suite of official conformance tests didn’t even cover all the features used by the official encoder ! We’ve already started adding our own conformance tests to deal with this. But I’ve complained enough in past posts about the lack of a spec ; let’s get onto the gritty details.
The next step was adding SIMD assembly for all of the important DSP functions. VP8′s motion compensation and deblocking filter are by far the most CPU-intensive parts, much the same as in H.264. Unlike H.264, the deblocking filter relies on a lot of internal saturation steps, which are free in SIMD but costly in a normal C implementation, making the plain C code even slower. Of course, none of this is a particularly large problem ; any sane video decoder has all this stuff in SIMD.
I tutored Ronald in x86 SIMD and wrote most of the motion compensation, intra prediction, and some inverse transforms. Ronald wrote the rest of the inverse transforms and a bit of the motion compensation. He also did the most difficult part : the deblocking filter. Deblocking filters are always a bit difficult because every one is different. Motion compensation, by comparison, is usually very similar regardless of video format ; a 6-tap filter is a 6-tap filter, and most of the variation going on is just the choice of numbers to multiply by.
The biggest challenge in an SIMD deblocking filter is to avoid unpacking, that is, going from 8-bit to 16-bit. Many operations in deblocking filters would naively appear to require more than 8-bit precision. A simple example in the case of x86 is abs(a-b), where a and b are 8-bit unsigned integers. The result of “a-b” requires a 9-bit signed integer (it can be anywhere from -255 to 255), so it can’t fit in 8-bit. But this is quite possible to do without unpacking : (satsub(a,b) | satsub(b,a)), where “satsub” performs a saturating subtract on the two values. If the value is positive, it yields the result ; if the value is negative, it yields zero. Oring the two together yields the desired result. This requires 4 ops on x86 ; unpacking would probably require at least 10, including the unpack and pack steps.
After the SIMD came optimizing the C code, which still took a significant portion of the total runtime. One of my biggest optimizations was adding aggressive “smart” prefetching to reduce cache misses. ffvp8 prefetches the reference frames (PREVIOUS, GOLDEN, and ALTREF)… but only the ones which have been used reasonably often this frame. This lets us prefetch everything we need without prefetching things that we probably won’t use. libvpx very often encodes frames that almost never (but not quite never) use GOLDEN or ALTREF, so this optimization greatly reduces time spent prefetching in a lot of real videos. There are of course countless other optimizations we made that are too long to list here as well, such as David’s entropy decoder optimizations. I’d also like to thank Eli Friedman for his invaluable help in benchmarking a lot of these changes.
What next ? Altivec (PPC) assembly is almost nonexistent, with the only functions being David’s motion compensation code. NEON (ARM) is completely nonexistent : we’ll need that to be fast on mobile devices as well. Of course, all this will come in due time — and as always — patches welcome !
Appendix : the raw numbers
Here’s the raw numbers (in fps) for the graphs at the start of this post, with standard error values :
Core i7 620QM (1.6Ghz), Windows 7, 32-bit :
Parkjoy ffvp8 : 44.58 0.44
Parkjoy libvpx : 33.06 0.23
Sintel ffvp8 : 74.26 1.18
Sintel libvpx : 56.11 0.96Core i5 520M (2.4Ghz), Linux, 64-bit :
Parkjoy ffvp8 : 68.29 0.06
Parkjoy libvpx : 41.06 0.04
Sintel ffvp8 : 112.38 0.37
Sintel libvpx : 69.64 0.09Core 2 T9300 (2.5Ghz), Mac OS X 10.6.4, 64-bit :
Parkjoy ffvp8 : 54.09 0.02
Parkjoy libvpx : 33.68 0.01
Sintel ffvp8 : 87.54 0.03
Sintel libvpx : 52.74 0.04Core Duo (2Ghz), Mac OS X 10.6.4, 32-bit :
Parkjoy ffvp8 : 21.31 0.02
Parkjoy libvpx : 17.96 0.00
Sintel ffvp8 : 41.24 0.01
Sintel libvpx : 29.65 0.02Atom N270 (1.6Ghz), Linux, 32-bit :
Parkjoy ffvp8 : 15.29 0.01
Parkjoy libvpx : 12.46 0.01
Sintel ffvp8 : 26.87 0.05
Sintel libvpx : 20.41 0.02 -
FFmpeg and Code Coverage Tools
21 août 2010, par Multimedia Mike — FATE Server, PythonCode coverage tools likely occupy the same niche as profiling tools : Tools that you’re supposed to use somewhere during the software engineering process but probably never quite get around to it, usually because you’re too busy adding features or fixing bugs. But there may come a day when you wish to learn how much of your code is actually being exercised in normal production use. For example, the team charged with continuously testing the FFmpeg project, would be curious to know how much code is being exercised, especially since many of the FATE test specs explicitly claim to be "exercising XYZ subsystem".
The primary GNU code coverage tool is called gcov and is probably already on your GNU-based development system. I set out to determine how much FFmpeg source code is exercised while running the full FATE suite. I ran into some problems when trying to use gcov on a project-wide scale. I spackled around those holes with some very ad-hoc solutions. I’m sure I was just overlooking some more obvious solutions about which you all will be happy to enlighten me.
Results
I’ve learned to cut to the chase earlier in blog posts (results first, methods second). With that, here are the results I produced from this experiment. This Google spreadsheet contains 3 sheets : The first contains code coverage stats for a bunch of FFmpeg C files sorted first by percent coverage (ascending), then by number of lines (descending), thus highlighting which files have the most uncovered code (ffserver.c currently tops that chart). The second sheet has files for which no stats were generated. The third sheet has "problems". These files were rejected by my ad-hoc script.Here’s a link to the data in CSV if you want to play with it yourself.
Using gcov with FFmpeg
To instrument a program for gcov analysis, compile and link the target program with the -fprofile-arcs and -ftest-coverage options. These need to be applied at both the compile and link stages, so in the case of FFmpeg, configure with :./configure \ —extra-cflags="-fprofile-arcs -ftest-coverage" \ —extra-ldflags="-fprofile-arcs -ftest-coverage"
The building process results in a bunch of .gcno files which pertain to code coverage. After running the program as normal, a bunch of .gcda files are generated. To get coverage statistics from these files, run
'gcov sourcefile.c'
. This will print some basic statistics as well as generate a corresponding .gcov file with more detailed information about exactly which lines have been executed, and how many times.Be advised that the source file must either live in the same directory from which gcov is invoked, or else the path to the source must be given to gcov via the
'-o, --object-directory'
option.Resetting Statistics
Statistics in the .gcda are cumulative. Should you wish to reset the statistics, doing this in the build directory should suffice :find . -name "*.gcda" | xargs rm -f
Getting Project-Wide Data
As mentioned, I had to get a little creative here to get a big picture of FFmpeg code coverage. After building FFmpeg with the code coverage options and running FATE,for file in `find . -name "*.c"` \ do \ echo "*****" $file \ gcov -o `dirname $file` `basename $file` \ done > ffmpeg-code-coverage.txt 2>&1
After that, I ran the ffmpeg-code-coverage.txt file through a custom Python script to print out the 3 CSV files that I later dumped into the Google Spreadsheet.
Further Work
I’m sure there are better ways to do this, and I’m sure you all will let me know what they are. But I have to get the ball rolling somehow.There’s also TestCocoon. I’d like to try that program and see if it addresses some of gcov’s shortcomings (assuming they are indeed shortcomings rather than oversights).
Source for script : process-gcov-slop.py
PYTHON :-
# !/usr/bin/python
-
-
import re
-
-
lines = open("ffmpeg-code-coverage.txt").read().splitlines()
-
no_coverage = ""
-
coverage = "filename, % covered, total lines\n"
-
problems = ""
-
-
stats_exp = re.compile(’Lines executed :(\d+\.\d+)% of (\d+)’)
-
for i in xrange(len(lines)) :
-
line = lines[i]
-
if line.startswith("***** ") :
-
filename = line[line.find(’./’)+2 :]
-
i += 1
-
if lines[i].find(":cannot open graph file") != -1 :
-
no_coverage += filename + ’\n’
-
else :
-
while lines[i].find(filename) == -1 and not lines[i].startswith("***** ") :
-
i += 1
-
try :
-
(percent, total_lines) = stats_exp.findall(lines[i+1])[0]
-
coverage += filename + ’, ’ + percent + ’, ’ + total_lines + ’\n’
-
except IndexError :
-
problems += filename + ’\n’
-
-
open("no_coverage.csv", ’w’).write(no_coverage)
-
open("coverage.csv", ’w’).write(coverage)
-
open("problems.csv", ’w’).write(problems)
-