
Recherche avancée
Médias (1)
-
Video d’abeille en portrait
14 mai 2011, par
Mis à jour : Février 2012
Langue : français
Type : Video
Autres articles (104)
-
Encoding and processing into web-friendly formats
13 avril 2011, parMediaSPIP automatically converts uploaded files to internet-compatible formats.
Video files are encoded in MP4, Ogv and WebM (supported by HTML5) and MP4 (supported by Flash).
Audio files are encoded in MP3 and Ogg (supported by HTML5) and MP3 (supported by Flash).
Where possible, text is analyzed in order to retrieve the data needed for search engine detection, and then exported as a series of image files.
All uploaded files are stored online in their original format, so you can (...) -
Emballe médias : à quoi cela sert ?
4 février 2011, parCe plugin vise à gérer des sites de mise en ligne de documents de tous types.
Il crée des "médias", à savoir : un "média" est un article au sens SPIP créé automatiquement lors du téléversement d’un document qu’il soit audio, vidéo, image ou textuel ; un seul document ne peut être lié à un article dit "média" ; -
Contribute to a better visual interface
13 avril 2011MediaSPIP is based on a system of themes and templates. Templates define the placement of information on the page, and can be adapted to a wide range of uses. Themes define the overall graphic appearance of the site.
Anyone can submit a new graphic theme or template and make it available to the MediaSPIP community.
Sur d’autres sites (6082)
-
WebM Cabal
I traveled to a secret clubhouse today to take part in a clandestine meeting to discuss exactly how WebM will rule over all that you see and hear on the web. I can’t really talk about it. But I can show you the cool hat I got :
Yeah, you’re jealous.
The back of the hat has an Easter egg for video codec nerds– the original Duck Corporation logo (On2′s original name) :
Former employees of On2 (now Googlers) were well-represented. It was an emotional day of closure as I met the person — the only person to date — who contacted me with a legal threat so many years ago. He still remembered me too.
I met a lot of people involved in creating various Duck and On2 codecs and learned a lot of history and lore behind then– history I hope to be able to document one day.
I’m glad I got that first rough draft of a toy VP8 encoder done in time for the meeting. It was the subject of much mirth.
-
Stop doing this in your encoder comparisons
14 juin 2010, par Dark Shikari — UncategorizedI’ll do a more detailed post later on how to properly compare encoders, but lately I’ve seen a lot of people doing something in particular that demonstrates they have no idea what they’re doing.
PSNR is not a very good metric. But it’s useful for one thing : if every encoder optimizes for it, you can effectively measure how good those encoders are at optimizing for PSNR. Certainly this doesn’t tell you everything you want to know, but it can give you a good approximation of “how good the encoder is at optimizing for SOMETHING“. The hope is that this is decently close to the visual results. This of course can fail to be the case if one encoder has psy optimizations and the other does not.
But it only works to begin with if both encoders are optimized for PSNR. If one optimizes for, say, SSIM, and one optimizes for PSNR, comparing PSNR numbers is completely meaningless. If anything, it’s worse than meaningless — it will bias enormously towards the encoder that is tuned towards PSNR, for obvious reasons.
And yet people keep doing this.
They keep comparing x264 against other encoders which are tuned against PSNR. But they don’t tell x264 to also tune for PSNR (–tune psnr, it’s not hard !), and surprise surprise, x264 loses. Of course, these people never bother to actually look at the output ; if they did, they’d notice that x264 usually looks quite a bit better despite having lower PSNR.
This happens so often that I suspect this is largely being done intentionally in order to cheat in encoder comparisons. Or perhaps it’s because tons of people who know absolutely nothing about video coding insist on doing comparisons without checking their methodology. Whatever it is, it clearly demonstrates that the person doing the test doesn’t understand what PSNR is or why it is used.
Another victim of this is Theora Ptalarbvorm, which optimizes for SSIM at the expense of PSNR — an absolutely great decision for visual quality. And of course if you just blindly compare Ptalarbvorm (1.2) and Thusnelda (1.1), you’ll notice Ptalarbvorm has much lower PSNR ! Clearly, it must be a worse encoder, right ?
Stop doing this. And call out the people who insist on cheating.
-
The Big VP8 Debug
20 novembre 2010, par Multimedia Mike — VP8I hope my previous walkthrough of the VP8 4x4 intra coding process was educational. Today, I’ll be walking through an example of what happens when my toy VP8 encoder encodes an intra 16x16 block. This may prove educational to those who have never been exposed to the deep details of this or related algorithms. Also, I wanted to illustrate where I think my VP8 encoder process is going bad and generating such grotesque results.
Before I start, let me give a shout-out to Google Docs’ Drawing tool which I used to generate these diagrams. It works quite well.
Results
(Always cut to the chase in a blog post ; results first.) I’m glad I composed this post. In the course of doing so, I found the problem, fixed it, and am now able to present this image that was decoded from the bitstream encoded by my
toyworking VP8 encoder :
Yeah, I know that image doesn’t look like anything you haven’t seen before. The difference is that it has made a successful trip through my VP8 encoder.
Follow along through the encoding process and learn of the mistake...
Original Block and Subblocks
Here is the 16x16 block to be encoded :
The block is broken down into 16 4x4 subblocks for further encoding :
Prediction
The first step is to pick a prediction mode, generate a prediction block, and subtract the predictors from the macroblock. In this case, we will use DC prediction which means the predictor will be the same for each element.In 4x4 VP8 DC intra prediction, samples outside of the frame are assumed to be 128. It’s a little different in 16x16 DC intra prediction— samples above the top row are assumed to be 127 while samples left of the leftmost column are assumed to be 129. For the top left macroblock, this still works out to 128.
Subtract 128 from each of the samples :
Forward Transform
Run each of the 16 prediction-removed subblocks through the forward transform. This example uses the forward transform from libvpx 0.9.5 :
I have highlighted the DC coefficients in each subblock. That’s because those receive special consideration in 16x16 intra coding.
Quantization
The Y plane AC quantizer is 4 in this example, the minimum allowed. (The Y plane DC quantizer is also 4 but doesn’t come into play for intra 16x16 coding since the DC coefficients follow a different process.) Thus, quantize (integer divide) each AC element in each subblock (we’ll ignore the DC coefficient for this part) :
The Y2 Round Trip
Those highlighted DC coefficients from each of the 16 subblocks comprise the Y2 block. This block is transformed with a slightly different algorithm called the Walsh-Hadamard Transform (WHT). The results of this transform are then quantized (using 8 for both Y2 DC and AC in this example, as those are the smallest Y2 quantizers that VP8 allows), then zigzagged and entropy-coded along with the rest of the macroblock coefficients.
On the decoder side, the Y2 coefficients are decoded, de-zigzagged, dequantized and run through the inverse WHT.
And this is where I suspect that most of the error is creeping into my VP8 encoder. Observe the round-trip through the Y2 process :
As intimated, this part causes me consternation due to the wide discrepancy between the original and the reconstructed Y2 blocks. Observe the absolute difference between the 2 vectors :
That’s really significant and leads me to believe that this is where the big problem is.
What’s Wrong ?
My first suspicion is that the quantization is throwing off the process. I was disabused of this idea when I removed quantization from the equation and immediately reversed the transform :
So perhaps there is a problem with the forward WHT. Just like with the usual subblock transform, the VP8 spec doesn’t define how to perform the forward WHT, only the inverse WHT. Do I need to audition different forward WHTs from various versions of libvpx, similar to what I did with the other transform ? That doesn’t make a lot of sense— libvpx doesn’t seem to have so much trouble with basic encoding.
The Punchline
I reviewed the forward WHT code, the stuff that I plagiarized from libvpx 0.9.0. The function takes, among other parameters, a pitch value. There are 2 loops in the code. The first iterates through the rows of the input matrix— which I assumed was a 4x4 matrix. I was puzzled that during each iteration of the row loop, the input pointer was only being advanced by
(pitch/2)
. I removed the division by 2 and the problem went away. I.e., the encoded image looks correct.What’s up with the
(pitch/2)
, anyway ? It seems that the encoder likes to pack 2 4x4 subblocks into an 8x4 block data structure. In fact, the forward DCTs in the libvpx encoder have the same artifact. Remember how I surveyed several variations of forward DCT from different versions of libvpx ? The one that proved most accurate in that test was the one I had already modified to advance the input pointer properly. Fixing the other 2 candidates yields similar results :input : 92 91 89 86 91 90 88 86 89 89 89 88 89 87 88 93 short 0.9.0 : -311 6 2 0 0 11 -6 1 2 -3 3 0 0 0 -2 1 inverse : 92 91 89 86 91 90 88 87 90 89 89 88 89 87 88 93 fast 0.9.0 : -313 5 1 0 1 11 -6 1 3 -3 4 0 0 0 -2 1 inverse : 91 91 89 86 90 90 88 86 89 89 89 88 89 87 88 93 short 0.9.5 : -312 7 1 0 1 12 -5 2 2 -3 3 -1 1 0 -2 1 inverse : 92 91 89 86 91 90 88 86 89 89 89 88 89 87 88 93
Code cribber beware !
Corrected Y2 Round Trip
Let’s look at that Y2 round trip one more time :
And another look at the error between the original and the reconstruction :
Better.
Dequantization, Prediction, Inverse Transforms, and Reconstruction
To be honest, now that I solved the major problem, I’m getting a little tired of making these pictures. Long story short, all elements of the original 16 subblocks are dequantized and their DC coefficients are filled in with the appropriate item from the reconstructed Y2 block. A base predictor block is generated (all 128 values in this case). And each Y block is run through the inverse transform and added to the predictor block. The following is the reconstruction :
And if you compare that against the original luma macroblock (I don’t feel like doing it right now), you’ll find that it’s pretty close.
I can’t believe how close I was all this time, and how long that pitch bug held me up.