Recherche avancée

Médias (1)

Mot : - Tags -/getid3

Autres articles (67)

Sur d’autres sites (9318)

  • The first in-depth technical analysis of VP8

    http://mirror05.x264.nl/Dark/website/compare/xvid.avi http://doom10.org/compare/ptalabvorm.ogv http://doom10.org/compare/xvid.avi
    19 mai 2010, par Dark Shikari — VP8, google

    Back in my original post about Internet video, I made some initial comments on the hope that VP8 would solve the problems of web video by providing a supposed patent-free video format with significantly better compression than the current options of Theora and Dirac. Fortunately, it seems I was able to acquire access to the VP8 spec, software, and source a good few days before the official release and so was able to perform a detailed technical analysis in time for the official release.

    The questions I will try to answer here are :

    1. How good is VP8 ? Is the file format actually better than H.264 in terms of compression, and could a good VP8 encoder beat x264 ? On2 claimed 50% better than H.264, but On2 has always made absurd claims that they were never able to back up with results, so such a number is almost surely wrong. VP7, for example, was claimed to be 15% better than H.264 while being much faster, but was in reality neither faster nor higher quality.

    2. How good is On2′s VP8 implementation ? Irrespective of how good the spec is, is the implementation good, or is this going to be just like VP3, where On2 releases an unusably bad implementation with the hope that the community will fix it for them ? Let’s hope not ; it took 6 years to fix Theora !

    3. How likely is VP8 to actually be free of patents ? Even if VP8 is worse than H.264, being patent-free is still a useful attribute for obvious reasons. But as noted in my previous post, merely being published by Google doesn’t guarantee that it is. Microsoft did similar a few years ago with the release of VC-1, which was claimed to be patent-free — but within mere months after release, a whole bunch of companies claimed patents on it and soon enough a patent pool was formed.

    We’ll start by going through the core features of VP8. We’ll primarily analyze them by comparing to existing video formats. Keep in mind that an encoder and a spec are two different things : it’s possible for good encoder to be written for a bad spec or vice versa ! Hence why a really good MPEG-1 encoder can beat a horrific H.264 encoder.

    But first, a comment on the spec itself.

    AAAAAAAGGGGGGGGGGGGGHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH !

    The spec consists largely of C code copy-pasted from the VP8 source code — up to and including TODOs, “optimizations”, and even C-specific hacks, such as workarounds for the undefined behavior of signed right shift on negative numbers. In many places it is simply outright opaque. Copy-pasted C code is not a spec. I may have complained about the H.264 spec being overly verbose, but at least it’s precise. The VP8 spec, by comparison, is imprecise, unclear, and overly short, leaving many portions of the format very vaguely explained. Some parts even explicitly refuse to fully explain a particular feature, pointing to highly-optimized, nigh-impossible-to-understand reference code for an explanation. There’s no way in hell anyone could write a decoder solely with this spec alone.

    Now that I’ve gotten that out of my system, let’s get back to VP8 itself. To begin with, to get a general sense for where all this fits in, basically all modern video formats work via some variation on the following chain of steps :

    Encode : Predict -> Transform + Quant -> Entropy Code -> Loopfilter
    Decode : Entropy Decode -> Predict -> Dequant + Inverse Transform -> Loopfilter

    If you’re looking to just get to the results and skip the gritty technical details, make sure to check out the “overall verdict” section and the “visual results” section. Or at least skip to the “summary for the lazy”.

    Prediction

    Prediction is any step which attempts to guess the content of an area of the frame. This could include functions based on already-known pixels in the same frame (e.g. inpainting) or motion compensation from a previous frame. Prediction usually involves side data, such as a signal telling the decoder a motion vector to use for said motion compensation.

    Intra Prediction

    Intra prediction is used to guess the content of a block without referring to other frames. VP8′s intra prediction is basically ripped off wholesale from H.264 : the “subblock” prediction modes are almost exactly identical (they even have the same names !) to H.264′s i4x4 mode, and the whole block prediction mode is basically identical to i16x16. Chroma prediction modes are practically identical as well. i8x8, from H.264 High Profile, is not present. An additional difference is that the planar prediction mode has been replaced with TM_PRED, a very vaguely similar analogue. The specific prediction modes are internally slightly different, but have the same names as in H.264.

    Honestly, I’m very disappointed here. While H.264′s intra prediction is good, it has certainly been improved on quite a bit over the past 7 years, and I thought that blatantly ripping it off was the domain of companies like Real (see RV40). I expected at least something slightly more creative out of On2. But more important than any of that : this is a patent time-bomb waiting to happen. H.264′s spatial intra prediction is covered in patents and I don’t think that On2 will be able to just get away with changing the rounding in the prediction modes. I’d like to see Google’s justification for this — they must have a good explanation for why they think there won’t be any patent issues.

    Update : spatial intra prediction apparently dates back to Nokia’s MVC H.26L proposal, from around 2000. It’s possible that Google believes that this is sufficient prior art to invalidate existing patents — which is not at all unreasonable !

    Verdict on Intra Prediction : Slightly modified ripoff of H.264. Somewhat worse than H.264 due to omission of i8x8.

    Inter Prediction

    Inter prediction is used to guess the content of a block by referring to past frames. There are two primary components to inter prediction : reference frames and motion vectors. The reference frame is a past frame from which to grab pixels from and the motion vectors index an offset into that frame. VP8 supports a total of 3 reference frames : the previous frame, the “alt ref” frame, and the “golden frame”. For motion vectors, VP8 supports variable-size partitions much like H.264. For subpixel precision, it supports quarter-pel motion vectors with a 6-tap interpolation filter. In short :

    VP8 reference frames : up to 3
    H.264 reference frames : up to 16
    VP8 partition types : 16×16, 16×8, 8×16, 8×8, 4×4
    H.264 partition types : 16×16, 16×8, 8×16, flexible subpartitions (each 8×8 can be 8×8, 8×4, 4×8, or 4×4).
    VP8 chroma MV derivation : each 4×4 chroma block uses the average of colocated luma MVs (same as MPEG-4 ASP)
    H.264 chroma MV derivation : chroma uses luma MVs directly
    VP8 interpolation filter : qpel, 6-tap luma, mixed 4/6-tap chroma
    H.264 interpolation filter : qpel, 6-tap luma (staged filter), bilinear chroma
    H.264 has but VP8 doesn’t : B-frames, weighted prediction

    H.264 has a significantly better and more flexible referencing structure. Sub-8×8 partitions are mostly unnecessary, so VP8′s omission of the H.264-style subpartitions has little consequence. The chroma MV derivation is more accurate in H.264 but slightly slower ; in practice the difference is probably near-zero both speed and compression-wise, since sub-8×8 luma partitions are rarely used (and I would suspect the same carries over to VP8).

    The VP8 interpolation filter is likely slightly better, but will definitely be slower to implement, both encoder and decoder-side. A staged filter allows the encoder to precalculate all possible halfpel positions and then quickly calculate qpel positions when necessary : an unstaged filter does not, making subpel motion estimation much slower. Not that unstaged filters are bad — staged filters have basically been abandoned for all of the H.265 proposals — it’s just an inherent disadvantage performance-wise. Additionally, having as high as 6 taps on chroma is, IMO, completely unnecessary and wasteful.

    The lack of B-frames in VP8 is a killer. B-frames can give 10-20% (or more) compression benefit for minimal speed cost ; their omission in VP8 probably costs more compression than all other problems noted in this post combined. This was not unexpected, however ; On2 has never used B-frames in any of their video formats. They also likely present serious patent problems, which probably explains their omission. Lack of weighted prediction is also going to hurt a bit, especially in fades.

    Update : Alt-ref frames can apparently be used to partially replicate the lack of B-frames. It’s not nearly as good, but it can get at least some of the benefit without actual B-frames.

    Verdict on Inter Prediction : Similar partitioning structure to H.264. Much weaker referencing structure. More complex, slightly better interpolation filter. Mostly a wash — except for the lack of B-frames, which is seriously going to hurt compression.

    Transform and Quantization

    After prediction, the encoder takes the difference between the prediction and the actual source pixels (the residual), transforms it, and quantizes it. The transform step is designed to make the data more amenable to compression by decorrelating it. The quantization step is the actual information-losing step where compression occurs ; the output values of transform are rounded, mostly to zero, leaving only a few integer coefficients.

    Transform

    For transform, VP8 uses again a very H.264-reminiscent scheme. Each 16×16 macroblock is divided into 16 4×4 DCT blocks, each of which is transformed by a bit-exact DCT approximation. Then, the DC coefficients of each block are collected into another 4×4 group, which is then Hadamard-transformed. OK, so this isn’t reminiscent of H.264, this is H.264. There are, however, 3 differences between VP8′s scheme and H.264′s.

    The first is that the 8×8 transform is omitted entirely (fitting with the omission of the i8x8 intra mode). The second is the specifics of the transform itself. H.264 uses an extremely simplified “DCT” which is so un-DCT-like that it often referred to as the HCT (H.264 Cosine Transform) instead. This simplified transform results in roughly 1% worse compression, but greatly simplifies the transform itself, which can be implemented entirely with adds, subtracts, and right shifts by 1. VC-1 uses a more accurate version that relies on a few small multiplies (numbers like 17, 22, 10, etc). VP8 uses an extremely, needlessly accurate version that uses very large multiplies (20091 and 35468). This in retrospect is not surpising, as it is very similar to what VP3 used.

    The third difference is that the Hadamard hierarchical transform is applied for some inter blocks, not merely i16x16. In particular, it also runs for p16x16 blocks. While this is definitely a good idea, especially given the small transform size (and the need to decorrelate the DC value between the small transforms), I’m not quite sure I agree with the decision to limit it to p16x16 blocks ; it seems that perhaps with a small amount of modification this could also be useful for other motion partitions. Also, note that unlike H.264, the hierarchical transform is luma-only and not applied to chroma.

    Overall, the transform scheme in VP8 is definitely weaker than in H.264. The lack of an 8×8 transform is going to have a significant impact on detail retention, especially at high resolutions. The transform is needlessly slower than necessary as well, though a shift-based transform might be out of the question due to patents. The one good new idea here is applying the hierarchical DC transform to inter blocks.

    Verdict on Transform : Similar to H.264. Slower, slightly more accurate 4×4 transform. Improved DC transform for luma (but not on chroma). No 8×8 transform. Overall, worse.

    Quantization

    For quantization, the core process is basically the same among all MPEG-like video formats, and VP8 is no exception. The primary ways that video formats tend to differentiate themselves here is by varying quantization scaling factors. There are two ways in which this is primarily done : frame-based offsets that apply to all coefficients or just some portion of them, and macroblock-level offsets. VP8 primarily uses the former ; in a scheme much less flexible than H.264′s custom quantization matrices, it allows for adjusting the quantizer of luma DC, luma AC, chroma DC, and so forth, separately. The latter (macroblock-level quantizer choice) can, in theory, be done using its “segmentation map” features, albeit very hackily and not very efficiently.

    The killer mistake that VP8 has made here is not making macroblock-level quantization a core feature of VP8. Algorithms that take advantage of macroblock-level quantization are known as “adaptive quantization” and are absolutely critical to competitive visual quality. My implementation of variance-based adaptive quantization (before, after) in x264 still stands to this day as the single largest visual quality gain in x264 history. Encoder comparisons have showed over and over that encoders without adaptive quantization simply cannot compete.

    Thus, while adaptive quantization is possible in VP8, the only way to implement it is to define one segment map for every single quantizer that one wants and to code the segment map index for every macroblock. This is inefficient and cumbersome ; even the relatively suboptimal MPEG-style delta quantizer system would be a better option. Furthermore, only 4 segment maps are allowed, for a maximum of 4 quantizers per frame.

    Verdict on Quantization : Lack of well-integrated adaptive quantization is going to be a killer when the time comes to implement psy optimizations. Overall, much worse.

    Entropy Coding

    Entropy coding is the process of taking all the information from all the other processes : DCT coefficients, prediction modes, motion vectors, and so forth — and compressing them losslessly into the final output file. VP8 uses an arithmetic coder somewhat similar to H.264′s, but with a few critical differences. First, it omits the range/probability table in favor of a multiplication. Second, it is entirely non-adaptive : unlike H.264′s, which adapts after every bit decoded, probability values are constant over the course of the frame. Accordingly, the encoder may periodically send updated probability values in frame headers for some syntax elements. Keyframes reset the probability values to the defaults.

    This approach isn’t surprising ; VP5 and VP6 (and probably VP7) also used non-adaptive arithmetic coders. How much of a penalty this actually means compression-wise is unknown ; it’s not easy to measure given the design of either H.264 or VP8. More importantly, I question the reason for this : making it adaptive would add just one single table lookup to the arithmetic decoding function — hardly a very large performance impact.

    Of course, the arithmetic coder is not the only part of entropy coding : an arithmetic coder merely turns 0s and 1s into an output bitstream. The process of creating those 0s and 1s and selecting the probabilities for the encoder to use is an equally interesting problem. Since this is a very complicated part of the video format, I’ll just comment on the parts that I found particularly notable.

    Motion vector coding consists of two parts : prediction based on neighboring motion vectors and the actual compression of the resulting delta between that and the actual motion vector. The prediction scheme in VP8 is a bit odd — worse, the section of the spec covering this contains no English explanation, just confusingly-written C code. As far as I can tell, it chooses an arithmetic coding context based on the neighboring MVs, then decides which of the predicted motion vectors to use, or whether to code a delta instead.

    The downside of this scheme is that, like in VP3/Theora (though not nearly as badly), it biases heavily towards the re-use of previous motion vectors. This is dangerous because, as the Theora devs have recently found (and fixed to some extent in Theora 1.2 aka Ptalabvorm), any situation in which the encoder picks a motion vector which isn’t the “real” motion vector in order to save bits can potentially have negative visual consequences. In terms of raw efficiency, I’m not sure whether VP8 or H.264′s prediction is better here.

    The compression of the resulting delta is similar to H.264, except for the coding of very large deltas, which is slightly better (similar to FFV1′s Golomb-like arithmetic codes).

    Intra prediction mode coding is done using arithmetic coding contexts based on the modes of the neighboring blocks. This is probably a good bit better than the hackneyed method that H.264 uses, which always struck me as being poorly designed.

    Residual coding is even more difficult to understand than motion vector coding, as the only full reference is a bunch of highly optimized, highly obfuscated C code. Like H.264′s CAVLC, it bases contexts on the number of nonzero coefficients in the top and left blocks relative to the current block. In addition, it also considers the magnitude of those coefficients and, like H.264′s CABAC, updates as coefficients are decoded.

    One more thing to note is the data partitioning scheme used by VP8. This scheme is much like VP3/Theora’s and involves putting each syntax element in its own component of the bitstream. The unfortunate problem with this is that it’s a nightmare for hardware implementations, greatly increasing memory bandwidth requirements. I have already received a complaint from a hardware developer about this specific feature with regard to VP8.

    Verdict on Entropy Coding : I’m not quite sure here. It’s better in some ways, worse in some ways, and just plain weird in others. My hunch is that it’s probably a very slight win for H.264 ; non-adaptive arithmetic coding has to have some serious penalties. It may also be a hardware implementation problem.

    Loop Filter

    The loop filter is run after decoding or encoding a frame and serves to perform extra processing on a frame, usually to remove blockiness in DCT-based video formats. Unlike postprocessing, this is not only for visual reasons, but also to improve prediction for future frames. Thus, it has to be done identically in both the encoder and decoder. VP8′s loop filter is vaguely similar to H.264′s, but with a few differences. First, it has two modes (which can be chosen by the encoder) : a fast mode and a normal mode. The fast mode is somewhat simpler than H.264′s, while the normal mode is somewhat more complex. Secondly, when filtering between macroblocks, VP8′s filter has wider range than the in-macroblock filter — H.264 did this, but only for intra edges.

    Third, VP8′s filter omits most of the adaptive strength mechanics inherent in H.264′s filter. Its only adaptation is that it skips filtering on p16x16 blocks with no coefficients. This may be responsible for the high blurriness of VP8′s loop filter : it will run over and over and over again on all parts of a macroblock even if they are unchanged between frames (as long as some other part of the macroblock is changed). H.264′s, by comparison, is strength-adaptive based on whether DCT coefficients exist on either side of a given edge and based on the motion vector delta and reference frame delta across said edge. Of course, skipping this strength calculation saves some decoding time as well.

    Update :
    05:28 < derf> Gumboot : You’ll be disappointed to know they got the loop filter ordering wrong again.
    05:29 < derf> Dark_Shikari : They ordered it such that you have to process each macroblock in full before processing the next one.

    Verdict on Loop Filter : Definitely worse compression-wise than H.264′s due to the lack of adaptive strength. Especially with the “fast” mode, might be significantly faster. I worry about it being too blurry.

    Overall verdict on the VP8 video format

    Overall, VP8 appears to be significantly weaker than H.264 compression-wise. The primary weaknesses mentioned above are the lack of proper adaptive quantization, lack of B-frames, lack of an 8×8 transform, and non-adaptive loop filter. With this in mind, I expect VP8 to be more comparable to VC-1 or H.264 Baseline Profile than with H.264. Of course, this is still significantly better than Theora, and in my tests it beats Dirac quite handily as well.

    Supposedly Google is open to improving the bitstream format — but this seems to conflict with the fact that they got so many different companies to announce VP8 support. The more software that supports a file format, the harder it is to change said format, so I’m dubious of any claim that we will be able to spend the next 6-12 months revising VP8. In short, it seems to have been released too early : it would have been better off to have an initial period during which revisions could be submitted and then a big announcement later when it’s completed.

    Update : it seems that Google is not open to changing the spec : it is apparently “final”, complete with all its flaws.

    In terms of decoding speed I’m not quite sure ; the current implementation appears to be about 16% slower than ffmpeg’s H.264 decoder (and thus probably about 25-35% slower than state-of-the-art decoders like CoreAVC). Of course, this doesn’t necessarily say too much about what a fully optimized implementation will reach, but the current one seems to be reasonably well-optimized and has SIMD assembly code for almost all major DSP functions, so I doubt it will get that much faster.

    I would expect, with equally optimized implementations, VP8 and H.264 to be relatively comparable in terms of decoding speed. This, of course, is not really a plus for VP8 : H.264 has a great deal of hardware support, while VP8 largely has to rely on software decoders, so being “just as fast” is in many ways not good enough. By comparison, Theora decodes almost 35% faster than H.264 using ffmpeg’s decoder.

    Finally, the problem of patents appears to be rearing its ugly head again. VP8 is simply way too similar to H.264 : a pithy, if slightly inaccurate, description of VP8 would be “H.264 Baseline Profile with a better entropy coder”. Even VC-1 differed more from H.264 than VP8 does, and even VC-1 didn’t manage to escape the clutches of software patents. It’s quite possible that VP8 has no patent issues, but until we get some hard evidence that VP8 is safe, I would be cautious. Since Google is not indemnifying users of VP8 from patent lawsuits, this is even more of a potential problem. Most importantly, Google has not released any justifications for why the various parts of VP8 do not violate patents, as Sun did with their OMS standard : such information would certainly cut down on speculation and make it more clear what their position actually is.

    But if luck is on Google’s side and VP8 does pass through the patent gauntlet unscathed, it will undoubtedly be a major upgrade as compared to Theora.

    Addendum A : On2′s VP8 Encoder and Decoder

    This post is primarily aimed at discussing issues relating to the VP8 video format. But from a practical perspective, while software can be rewritten and improved, to someone looking to use VP8 in the near future, the quality (both code-wise, compression-wise, and speed-wise) of the official VP8 encoder and decoder is more important than anything I’ve said above. Thus, after reading through most of the code, here’s my thoughts on the software.

    Initially I was intending to go easy on On2 here ; I assumed that this encoder was in fact new for VP8 and thus they wouldn’t necessarily have time to make the code high-quality and improve its algorithms. However, as I read through the encoder, it became clear that this was not at all true ; there were comments describing bugfixes dating as far back as early 2004That’s right : this software is even older than x264 ! I’m guessing that the current VP8 software simply evolved from the original VP7 software. Anyways, this means that I’m not going to go easy on On2 ; they’ve had (at least) 6 years to work on VP8, and a much larger dev team than x264′s to boot.

    Before I tear the encoder apart, keep in mind that it isn’t bad. In fact, compression-wise, I don’t think they’re going to be able to get it that much better using standard methods. I would guess that the encoder, on slowest settings, is within 5-10% of the maximum PSNR that they’ll ever get out of it. There’s definitely a whole lot more to be had using unusual algorithms like MB-tree, not to mention the complete lack of psy optimizations — but at what it tries to do, it does pretty decently. This is in contrast to the VP3 encoder, which was a pile of garbage (just ask any Theora dev).

    Before I go into specific components, a general note on code quality. The code quality is much better than VP3, though there’s still tons of typos in the comments. They also appear to be using comments as a form of version control system, which is a bit bizarre. The assembly code is much worse, with staggering levels of copy-paste coding, some completely useless instructions that do nothing at all, unaligned loads/stores to what-should-be aligned data structures, and a few functions that are simply written in unfathomably roundabout (and slower) ways. While the C code isn’t half bad, the assembly is clearly written by retarded monkeys. But I’m being unfair : this is way better than with VP3.

    Motion estimation : Diamond, hex, and exhaustive (full) searches available. All are pretty naively implemented : hexagon, for example, performs a staggering amount of redundant work (almost half of the locations it searches are repeated !). Full is even worse in terms of inefficiency, but it’s useless for all but placebo-level speeds, so I’m not really going to complain about that.

    Subpixel motion estimation : Straightforward iterative diamond and square searches. Nothing particularly interesting here.

    Quantization : Primary quantization has two modes : a fast mode and a slightly slower mode. The former is just straightforward deadzone quant, while the latter has a bias based on zero-run length (not quite sure how much this helps, but I like the idea). After this they have “coefficient optimization” with two modes. One mode simply tries moving each nonzero coefficient towards zero ; the slow mode tries all 2^16 possible DCT coefficient rounding permutations. Whoever wrote this needs to learn what trellis quantization (the dynamic programming solution to the problem) is and stop using exponential-time algorithms in encoders.

    Ratecontrol (frame type handling) : Relies on “boosting” the quality of golden frames and “alt-ref” frames — a concept I find extraordinarily dubious because it means that the video will periodically “jump” to a higher quality level, which looks utterly terrible in practice. You can see the effect in this graph of PSNR ; every dozen frames or so, the quality “jumps”. This cannot possibly look good in motion.

    Ratecontrol (overall) : Relies on a purely reactive ratecontrol algorithm, which probably will not do very well in difficult situations such as hard-CBR and tight buffer constraints. Furthermore, it does no adaptation of the quantizer within the frame (e.g. in the case that the frame overshot the size limitations ratecontrol put on it). Instead, it relies on re-encoding the frame repeatedly to reach the target size — which in practice is simply not a usable option for two reasons. In low-latency situations where one can’t have a large delay, re-encoding repeatedly may send the encoder way behind time-wise. In any other situation, one can afford to use frame-based threading, a much faster algorithm for multithreaded encoding than the typical slice-based threading — which makes re-encoding impossible.

    Loop filter : The encoder attempts to optimize the loop filter parameters for maximum PSNR. I’m not quite sure how good an idea this is ; every example I’ve seen of this with H.264 ends up creating very bad (often blurry) visual results.

    Overall performance : Even on the absolute fastest settings with multithreading, their encoder is slow. On my 1.6Ghz Core i7 it gets barely 26fps encoding 1080p ; not even enough to reliably do real-time compression. x264, by comparison, gets 101fps at its fastest preset “ultrafast”. Now, sure, I don’t expect On2′s encoder to be anywhere near as fast as x264, but being unable to stream HD video on a modern quad-core system is simply not reasonable in 2010. Additionally, the speed options are extraordinarily confusing and counterintuitive and don’t always seem to work properly ; for example, fast encoding mode (–rt) seems to be ignored completely in 2-pass.

    Overall compression : As said before, compression-wise the encoder does a pretty good job with the spec that it’s given. The slower algorithms in the encoder are clearly horrifically unoptimized (see the comments on motion search and quantization in particular), but they still work.

    Decoder : Seems to be straightforward enough. Nothing jumped out at me as particularly bad, slow, or otherwise, besides the code quality issues mentioned above.

    Practical problems : The encoder and decoder share a staggering amount of code. This means that any bug in the common code will affect both, and thus won’t be spotted because it will affect them both in a matching fashion.  This is the inherent problem with any file format that doesn’t have independent implementations and is defined by a piece of software instead of a spec : there are always bugs. RV40 had a hilarious example of this, where a typo of “22″ instead of “33″ resulted in quarter-pixel motion compensation being broken. Accordingly, I am very dubious of any file format defined by software instead of a specification. Google should wait until independent implementations have been created before setting the spec in stone.

    Update : it seems that what I forsaw is already coming true :

    <derf> gmaxwell : It survives it with a patch that causes artifacts because their encoder doesn’t clamp MVs properly.
    <gmaxwell> ::cries: :
    <derf> So they reverted my decoder patch, instead of fixing the encoder.
    <gmaxwell> “but we have many files encoded with this !”
    <gmaxwell> so great.. single implementation and it depends on its own bugs. :(

    This is just like Internet Explorer 6 all over again — bugs in the software become part of the “spec” !

    Hard PSNR numbers :
    (Source/target bitrate are the same as in my upcoming comparison.)
    x264, slowest mode, High Profile : 29.76103db ( 28% better than VP8)
    VP8, slowest mode : 28.37708db ( 8.5% better than x264 baseline)
    x264, slowest mode, Baseline Profile : 27.95594db

    Note that these numbers are a “best-case” situation : we’re testing all three optimized for PSNR, which is what the current VP8 encoder specializes in as well. This is not too different from my expectations above as estimated from the spec itself ; it’s relatively close to x264′s Baseline Profile.

    Keep in mind that this is not representative of what you can get out of VP8 now, but rather what could be gotten out of VP8. PSNR is meaningless for real-world encoding — what matters is visual quality — so hopefully if problems like the adaptive quantization issue mentioned previously can be overcome, the VP8 encoder could be improved to have x264-level psy optimizations. However, as things stand…

    Visual results : Unfortunately, since the current VP8 encoder optimizes entirely for PSNR, the visual results are less than impressive. Here’s a sampling of how it compares with some other encoders. Source and bitrate are the same as above ; all encoders are optimized for optimal visual quality wherever possible. And apparently given some of the responses to this part, many people cannot actually read ; the bitrate is (as close as possible to) the same on all of these files.

    Update : I got completely slashdotted and my few hundred gigs of bandwidth ran out in mere hours. The images below have been rehosted, so if you’ve pasted the link somewhere else, check below for the new one.

    VP8 (On2 VP8 rc8) (source) (Note : I recently realized that the official encoder doesn’t output MKV, so despite the name, this file is actually a VP8 bitstream wrapped in IVF, as generated by ivfenc. Decode it with ivfdec.)
    H.264 (Recent x264) (source)
    H.264 Baseline Profile (Recent x264) (source)
    Theora (Recent ptalabvorm nightly) (source)
    Dirac (Schroedinger 1.0.9) (source)
    VC-1 (Microsoft VC-1 SDK) (source)
    MPEG-4 ASP (Xvid 1.2.2) (source)

    The quality generated by On2′s VP8 encoder will probably not improve significantly without serious psy optimizations.

    One further note about the encoder : currently it will drop frames by default, which is incredibly aggravating and may cause serious problems. I strongly suggest anyone using it to turn the frame-dropping feature off in the options.

    Addendum B : Google’s choice of container and audio format for HTML5

    Google has chosen Matroska for their container format. This isn’t particularly surprising : Matroska is one of the most widely used “modern” container formats and is in many ways best-suited to the task. MP4 (aka ISOmedia) is probably a better-designed format, but is not very flexible ; while in theory it can stick anything in a private stream, a standardization process is technically necessary to “officially” support any new video or audio formats. Patents are probably a non-issue ; the MP4 patent pool was recently disbanded, largely because nobody used any of the features that were patented.

    Another advantage of Matroska is that it can be used for streaming video : while it isn’t typically, the spec allows it. Note that I do not mean progressive download (a’la Youtube), but rather actual streaming, where the encoder is working in real-time. The only way to do this with MP4 is by sending “segments” of video, a very hacky approach in which one is effectively sending a bunch of small MP4 files in sequence. This approach is used by Microsoft’s Silverlight “Smooth Streaming”. Not only is this an ugly hack, but it’s unsuitable for low-latency video. This kind of hack is unnecessary for Matroska. One possible problem is that since almost nobody currently uses Matroska for live streaming purposes, very few existing Matroska implementations support what is necessary to play streamed Matroska files.

    I’m not quite sure why Google chose to rebrand Matroska ; “WebM” is a silly name and Matroska is already pretty well-recognized as a brand.

    The choice of Vorbis for audio is practically a no-brainer. Even ignoring the issue of patents, libvorbis is still the best general-purpose open source audio encoder. While AAC is generally better at very low bitrates, there aren’t any good open source AAC encoders : faac is worse than LAME and ffmpeg’s AAC encoder is even worse. Furthermore, faac is not free software ; it contains code from the non-free reference encoder. Combined with the patent issue, nobody expected Google to pick anything else.

    Addendum C : Summary for the lazy

    VP8, as a spec, should be a bit better than H.264 Baseline Profile and VC-1. It’s not even close to competitive with H.264 Main or High Profile. If Google is willing to revise the spec, this can probably be improved.

    VP8, as an encoder, is somewhere between Xvid and Microsoft’s VC-1 in terms of visual quality. This can definitely be improved a lot.

    VP8, as a decoder, decodes even slower than ffmpeg’s H.264. This probably can’t be improved that much ; VP8 as a whole is similar in complexity to H.264.

    With regard to patents, VP8 copies too much from H.264 for comfort, no matter whose word is behind the claim of being patent-free. This doesn’t mean that it’s sure to be covered by patents, but until Google can give us evidence as to why it isn’t, I would be cautious.

    VP8 is definitely better compression-wise than Theora and Dirac, so if its claim to being patent-free does stand up, it’s a big upgrade with regard to patent-free video formats.

    VP8 is not ready for prime-time ; the spec is a pile of copy-pasted C code and the encoder’s interface is lacking in features and buggy. They aren’t even ready to finalize the bitstream format, let alone switch the world over to VP8.

    With the lack of a real spec, the VP8 software basically is the spec–and with the spec being “final”, any bugs are now set in stone. Such bugs have already been found and Google has rejected fixes.

    Google made the right decision to pick Matroska and Vorbis for its HTML5 video proposal.

    29.76103

  • How to cheat on video encoder comparisons

    21 juin 2010, par Dark Shikari — H.264, benchmark, stupidity, test sequences

    Over the past few years, practically everyone and their dog has published some sort of encoder comparison. Sometimes they’re actually intended to be something for the world to rely on, like the old Doom9 comparisons and the MSU comparisons. Other times, they’re just to scratch an itch — someone wants to decide for themselves what is better. And sometimes they’re just there to outright lie in favor of whatever encoder the author likes best. The latter is practically an expected feature on the websites of commercial encoder vendors.

    One thing almost all these comparisons have in common — particularly (but not limited to !) the ones done without consulting experts — is that they are horribly done. They’re usually easy to spot : for example, two videos at totally different bitrates are being compared, or the author complains about one of the videos being “washed out” (i.e. he screwed up his colorspace conversion). Or the results are simply nonsensical. Many of these problems result from the person running the test not “sanity checking” the results to catch mistakes that he made in his test. Others are just outright intentional.

    The result of all these mistakes, both intentional and accidental, is that the results of encoder comparisons tend to be all over the map, to the point of absurdity. For any pair of encoders, it’s practically a given that a comparison exists somewhere that will “prove” any result you want to claim, even if the result would be beyond impossible in any sane situation. This often results in the appearance of a “controversy” even if there isn’t any.

    Keep in mind that every single mistake I mention in this article has actually been done, usually in more than one comparison. And before I offend anyone, keep in mind that when I say “cheating”, I don’t mean to imply that everyone that makes the mistake is doing it intentionally. Especially among amateur comparisons, most of the mistakes are probably honest.

    So, without further ado, we will investigate a wide variety of ways, from the blatant to the subtle, with which you too can cheat on your encoder comparisons.

    Blatant cheating

    1. Screw up your colorspace conversions. A common misconception is that converting from YUV to RGB and back is a simple process where nothing can go wrong. This is quite untrue. There are two primary attributes of YUV : PC range (0-255) vs TV range (16-235) and BT.709 vs BT.601 conversion coefficients. That sums up to a total of 4 possible different types of YUV. When people compare encoders, they often use different frontends, some of which make incorrect assumptions about these attributes.

    Incorrect assumptions are so common that it’s often a matter of luck whether the tool gets it right or not. It doesn’t help that most videos don’t even properly signal which they are to begin with ! Often even the tool that the person running the comparison is using to view the source material gets the conversion wrong.

    Subsampling YUV (aka what everyone uses) adds yet another dimension to the problem : the locations which the chroma data represents (“chroma siting”) isn’t constant. For example, JPEG and MPEG-2 define different positions. This is even worse because almost nobody actually handles this correctly — the best approach is to simply make sure none of your software is doing any conversion. A mistake in chroma siting is what created that infamous PSNR graph showing Theora beating x264, which has been cited for ages since despite the developers themselves retracting it after realizing their mistake.

    Keep in mind that the video encoder is not responsible for colorspace conversion — almost all video encoders operate in the YUV domain (usually subsampled 4:2:0 YUV, aka YV12). Thus any problem in colorspace conversion is usually the fault of the tools used, not the actual encoder.

    How to spot it : “The color is a bit off” or “the contrast of the video is a bit duller”. There were a staggering number of “H.264 vs Theora” encoder comparisons which came out in favor of one or the other solely based on “how well the encoder kept the color” — making the results entirely bogus.

    2. Don’t compare at the same (or nearly the same) bitrate. I saw a VP8 vs x264 comparison the other day that gave VP8 30% more bitrate and then proceeded to demonstrate that it got better PSNR. You would think this is blindingly obvious, but people still make this mistake ! The most common cause of this is assuming that encoders will successfully reach the target bitrate you ask of them — particularly with very broken encoders that don’t. Always check the output filesizes of your encodes.

    How to spot it : The comparison lists perfectly round bitrates for every single test, as opposed to the actual bitrates achieved by the encoders, which will never be exactly matching in any real test.

    3. Use unfair encoding settings. This is a bit of a wide topic : there are many ways to do this. We’ll cover the more blatant ones in this part. Here’s some common ones :

    a. Simply cheat. Intentionally pick awful settings for the encoder you don’t like.

    b. Don’t consider performance. Pick encoding settings without any regard for some particular performance goal. For example, it’s perfectly reasonable to say “use the best settings possible, regardless of speed”. It’s also reasonable to look for a particular encoding speed target. But what isn’t reasonable is to pick extremely fast settings for one encoder and extremely slow settings for another encoder.

    c. Don’t attempt match compatibility options when it’s reasonable to do so. Keyframe interval is a classic one of these : shorter values reduce compression but improve seeking. An easy way to cheat is to simply not set them to the same value, biasing towards whatever encoder has the longer interval. This is most common as an accidental mistake with comparisons involving ffmpeg, where the default keyframe interval is an insanely low 12 frames.

    How to spot it : The comparison doesn’t document its approach regarding choice of encoding settings.

    4. Use ratecontrol methods unfairly. Constant bitrate is not the same as average bitrate — using one instead of the other is a great way to completely ruin a comparison. Another method is to use 1-pass bitrate mode for one encoder and 2-pass or constant quality for another. A good general approach is that, for any given encoder, one should use 2-pass if available and constant quality if not (it may take a few runs to get the bitrate you want, of course).

    Of course, it’s also fine to run a comparison with a particular mode in mind — for example, a comparison targeted at streaming applications might want to test using 1-pass CBR. Of course, in such a case, if CBR is not available in an encoder, you can’t compare to that encoder.

    How to spot it : It’s usually pretty obvious if the encoding settings are given.

    5. Use incredibly old versions of encoders. As it happens, Debian stable is not the best source for the most recent encoding software. Equally, using recent versions known to be buggy.

    6. Don’t distinguish between video formats and the software that encodes them. This is incredibly common : I’ve seen tests that claim to compare “H.264″ against something else while in fact actually comparing “Quicktime” against something else. It’s impossible to compare all H.264 encoders at once, so don’t even try — just call the comparison “Quicktime versus X” instead of “H.264 versus X”. Or better yet, use a good H.264 encoder, like x264 and don’t bother testing awful encoders to begin with.

    Less-obvious cheating

    1. Pick a bitrate that’s way too low. Low bitrate testing is very effective at making differences between encoders obvious, particularly if doing a visual comparison. But past a certain point, it becomes impossible for some encoders to keep up. This is usually an artifact of the video format itself — a scalability limitation. Practically all DCT-based formats have this kind of limitation (wavelets are mostly immune).

    In reality, this is rarely a problem, because one could merely downscale the video to resolve the problem — lower resolutions need fewer bits. But people rarely do this in comparisons (it’s hard to do it fairly), so the best approach is to simply not use absurdly low bitrates. What is “absurdly low” ? That’s a hard question — it ends up being a matter of using one’s best judgement.

    This tends to be less of a problem in larger-scale tests that use many different bitrates.

    How to spot it : At least one of the encoders being compared falls apart completely and utterly in the screenshots.

    Biases towards, a lot : Video formats with completely scalable coding methods (Dirac, Snow, JPEG-2000, SVC).

    Biases towards, a little : Video formats with coding methods that improve scalability, such as arithmetic coding, B-frames, and run-length coding. For example, H.264 and Theora tend to be more scalable than MPEG-4.

    2. Pick a bitrate that’s way too high. This is staggeringly common mistake : pick a bitrate so high that all of the resulting encodes look absolutely perfect. The claim is then made that “there’s no significant difference” between any of the encoders tested. This is surprisingly easy to do inadvertently on sources like Big Buck Bunny, which looks transparent at relatively low bitrates. An equally common but similar mistake is to test at a bitrate that isn’t so high that the videos look perfect, but high enough that they all look very good. The claim is then made that “the difference between these encoders is small”. Well, of course, if you give everything tons of bitrate, the difference between encoders is small.

    How to spot it : You can’t tell which image is the source and which is the encode.

    3. Making invalid comparisons using objective metrics. I explained this earlier in the linked blog post, but in short, if you’re going to measure PSNR, make sure all the encoders are optimized for PSNR. Equally, if you’re going to leave the encoder optimized for visual quality, don’t measure PSNR — post screenshots instead. Same with SSIM or any other objective metric. Furthermore, don’t blindly do metric comparisons — always at least look at the output as a sanity test. Finally, do not claim that PSNR is particularly representative of visual quality, because it isn’t.

    How to spot it : Encoders with psy optimizations, such as x264 or Theora 1.2, do considerably worse than expected in PSNR tests, but look much better in visual comparisons.

    4. Lying with graphs. Using misleading scales on graphs is a great way to make the differences between encoders seem larger or smaller than they actually are. A common mistake is to scale SSIM linearly : in fact, 0.99 is about twice as good as 0.98, not 1% better. One solution for this is to use db to compare SSIM values.

    5. Using lossy screenshots. Posting screenshots as JPEG is a silly, pointless way to worsen an encoder comparison.

    Subtle cheating

    1. Unfairly pick screenshots for comparison. Comparing based on stills is not ideal, but it’s often vastly easier than comparing videos in motion. But it also opens up the door to unfairness. One of the most common mistakes is to pick a frame immediately after (or on) a keyframe for one encoder, but which isn’t for the other encoder. Particularly in the case of encoders that massively boost keyframe quality, this will unfairly bias in favor of the one with the recent keyframe.

    How to spot it : It’s very difficult to tell, if not impossible, unless they provide the video files to inspect.

    2. Cherry-pick source videos. Good source videos are incredibly hard to come by — almost everything is already compressed and what’s left is usually a very poor example of real content. Here’s some common ways to bias unfairly using cherry-picking :

    a. Pick source videos that are already heavily compressed. Pre-compressed source isn’t much of an issue if your target quality level for testing is much lower than that of the source, since any compression artifacts in the source will be a lot smaller than those created by the encoders. But if the source is already very compressed, or you’re testing at a relatively high quality level, this becomes a significant issue.

    Biases towards : Anything that uses a similar transform to the source content. For MPEG-2 source material, this biases towards formats that use the 8x8dct or a very close approximation : MPEG-1/2/4, H.263, and Theora. For H.264 source material, this biases towards formats that use a 4×4 transform : H.264 and VP8.

    b. Pick standard test clips that were not intended for this purpose. There are a wide variety of uncompressed “standard test clips“. Some of these are not intended for general-purpose use, but rather exist to test specific encoder capabilities. For example, Mobile Calendar (“mobcal”) is extremely sharp and low motion, serving to test interpolation capabilities. It will bias incredibly heavily towards whatever encoder uses more B-frames and/or has higher-precision motion compensation. Other test clips are almost completely static, such as the classic “akiyo”. These are also not particularly representative of real content.

    c. Pick very noisy content. Noise is — by definition — not particularly compressible. Both in terms of PSNR and visual quality, a very noisy test clip will tend to reduce the differences between encoders dramatically.

    d. Pick a test clip to exercise a specific encoder feature. I’ve often used short clips from Touhou games to demonstrate the effectiveness of x264′s macroblock-tree algorithm. I’ve sometimes even used it to compare to other encoders as part of such a demonstration. I’ve also used the standard test clip “parkrun” as a demonstration of adaptive quantization. But claiming that either is representative of most real content — and thus can be used as a general determinant of how good encoders are — is of course insane.

    e. Simply encode a bunch of videos and pick the one your favorite encoder does best on.

    3. Preprocessing the source. A encoder test is a test of encoders, not preprocessing. Some encoding apps may add preprocessors to the source, such as noise reduction. This may make the video look better — possibly even better than the source — but it’s not a fair part of comparing the actual encoders.

    4. Screw up decoding. People often forget that in addition to encoding, a test also involves decoding — a step which is equally possible to do wrong. One common error caused by this is in tests of Theora on content whose resolution isn’t divisible by 16. Decoding is often done with ffmpeg — which doesn’t crop the edges properly in some cases. This isn’t really a big deal visually, but in a PSNR comparison, misaligning the entire frame by 4 or 8 pixels is a great way of completely invalidating the results.

    The greatest mistake of all

    Above all, the biggest and most common mistake — and the one that leads to many of the problems mentioned here – is the mistaken belief that one, or even a few tests can really represent all usage fairly. Any comparison has to have some specific goal — to compare something in some particular case, whether it be “maximum offline compression ignoring encoding speed” or “real-time high-speed video streaming” or whatnot. And even then, no comparison can represent all use-cases in that category alone. An encoder comparison can only be honest if it’s aware of its limitations.

  • No sounds on Apple devices after encoding videos [migrated]

    15 décembre 2013, par Ricardo

    I'm having a problem setting up a media server.
    Everything works just great except the sound of Apple devices, I'm not sure if that's something with "mute" on iOS or our codecs are just not compatible with iOS.

    OS :

    Ubuntu 12.04

    FFMPEG Config :

    ffmpeg version 0.10.8-7:0.10.8-1~lucid1 Copyright 2000-2013 the FFmpeg developers
     built on Sep  5 2013 19:50:14 with gcc 4.4.3
     configuration: --arch=amd64 --disable-stripping --enable-pthreads --enable-runtime-cpudetect --extra-version=&#39;7:0.10.8-1~lucid1&#39; --libdir=/usr/lib --prefix=/usr --enable-bzlib --enable-libdc1394 --enable-libfreetype --enable-frei0r --enable-gnutls --enable-libgsm --enable-libmp3lame --enable-libopenjpeg --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-vdpau --enable-libvorbis --enable-libvpx --enable-zlib --enable-gpl --enable-postproc --enable-libcdio --enable-x11grab --enable-libx264 --shlibdir=/usr/lib --enable-shared --disable-static
     avcodec     configuration: --arch=amd64 --disable-stripping --enable-pthreads --enable-runtime-cpudetect --extra-version=&#39;7:0.10.8-1~lucid1&#39; --libdir=/usr/lib --prefix=/usr --enable-bzlib --enable-libdc1394 --enable-libfreetype --enable-frei0r --enable-gnutls --enable-libgsm --enable-libmp3lame --enable-libopenjpeg --enable-libpulse --enable-libschroedinger --enable-libspeex --enable-libtheora --enable-vdpau --enable-libvorbis --enable-libvpx --enable-zlib --enable-gpl --enable-postproc --enable-libcdio --enable-x11grab --enable-libx264 --shlibdir=/usr/lib --enable-shared --disable-static --enable-version3 --enable-libopencore-amrnb --enable-libopencore-amrwb
     libavutil      51. 35.100 / 51. 35.100
     libavcodec     53. 61.100 / 53. 61.100
     libavformat    53. 32.100 / 53. 32.100
     libavdevice    53.  4.100 / 53.  4.100
     libavfilter     2. 61.100 /  2. 61.100
     libswscale      2.  1.100 /  2.  1.100
     libswresample   0.  6.100 /  0.  6.100
     libpostproc    52.  0.100 / 52.  0.100
    Hyper fast Audio and Video encoder

    Codecs :

    D..... = Decoding supported
    .E.... = Encoding supported
    ..V... = Video codec
    ..A... = Audio codec
    ..S... = Subtitle codec
    ...S.. = Supports draw_horiz_band
    ....D. = Supports direct rendering method 1
    .....T = Supports weird frame truncation
    ------
    D V D  4xm             4X Movie
    D V D  8bps            QuickTime 8BPS video
    D A D  8svx_exp        8SVX exponential
    D A D  8svx_fib        8SVX fibonacci
     EV    a64multi        Multicolor charset for Commodore 64
     EV    a64multi5       Multicolor charset for Commodore 64, extended with 5th color (colram)
    DEA D  aac             Advanced Audio Coding
    D A D  aac_latm        AAC LATM (Advanced Audio Codec LATM syntax)
    D V D  aasc            Autodesk RLE
    DEA D  ac3             ATSC A/52A (AC-3)
     EA    ac3_fixed       ATSC A/52A (AC-3)
    D A D  adpcm_4xm       ADPCM 4X Movie
    DEA D  adpcm_adx       SEGA CRI ADX ADPCM
    D A D  adpcm_ct        ADPCM Creative Technology
    D A D  adpcm_ea        ADPCM Electronic Arts
    D A D  adpcm_ea_maxis_xa ADPCM Electronic Arts Maxis CDROM XA
    D A D  adpcm_ea_r1     ADPCM Electronic Arts R1
    D A D  adpcm_ea_r2     ADPCM Electronic Arts R2
    D A D  adpcm_ea_r3     ADPCM Electronic Arts R3
    D A D  adpcm_ea_xas    ADPCM Electronic Arts XAS
    D A D  adpcm_ima_amv   ADPCM IMA AMV
    D A D  adpcm_ima_apc   ADPCM IMA CRYO APC
    D A D  adpcm_ima_dk3   ADPCM IMA Duck DK3
    D A D  adpcm_ima_dk4   ADPCM IMA Duck DK4
    D A D  adpcm_ima_ea_eacs ADPCM IMA Electronic Arts EACS
    D A D  adpcm_ima_ea_sead ADPCM IMA Electronic Arts SEAD
    D A D  adpcm_ima_iss   ADPCM IMA Funcom ISS
    DEA D  adpcm_ima_qt    ADPCM IMA QuickTime
    D A D  adpcm_ima_smjpeg ADPCM IMA Loki SDL MJPEG
    DEA D  adpcm_ima_wav   ADPCM IMA WAV
    D A D  adpcm_ima_ws    ADPCM IMA Westwood
    DEA D  adpcm_ms        ADPCM Microsoft
    D A D  adpcm_sbpro_2   ADPCM Sound Blaster Pro 2-bit
    D A D  adpcm_sbpro_3   ADPCM Sound Blaster Pro 2.6-bit
    D A D  adpcm_sbpro_4   ADPCM Sound Blaster Pro 4-bit
    DEA D  adpcm_swf       ADPCM Shockwave Flash
    D A D  adpcm_thp       ADPCM Nintendo Gamecube THP
    D A D  adpcm_xa        ADPCM CDROM XA
    DEA D  adpcm_yamaha    ADPCM Yamaha
    DEA D  alac            ALAC (Apple Lossless Audio Codec)
    D A D  als             MPEG-4 Audio Lossless Coding (ALS)
    D A D  amrnb           Adaptive Multi-Rate NarrowBand
    D A D  amrwb           Adaptive Multi-Rate WideBand
    DEV    amv             AMV Video
    D V D  anm             Deluxe Paint Animation
    D V D  ansi            ASCII/ANSI art
    D A D  ape             Monkey&#39;s Audio
    DES    ass             Advanced SubStation Alpha subtitle
    DEV D  asv1            ASUS V1
    DEV D  asv2            ASUS V2
    D A D  atrac1          Atrac 1 (Adaptive TRansform Acoustic Coding)
    D A D  atrac3          Atrac 3 (Adaptive TRansform Acoustic Coding 3)
    D V D  aura            Auravision AURA
    D V D  aura2           Auravision Aura 2
    DEV D  avrp            Avid 1:1 10-bit RGB Packer
    D V D  avs             AVS (Audio Video Standard) video
    D V D  bethsoftvid     Bethesda VID video
    D V D  bfi             Brute Force &amp; Ignorance
    D A D  binkaudio_dct   Bink Audio (DCT)
    D A D  binkaudio_rdft  Bink Audio (RDFT)
    D V    binkvideo       Bink video
    D V D  bintext         Binary text
    DEV D  bmp             BMP image
    D A D  bmv_audio       Discworld II BMV audio
    D V    bmv_video       Discworld II BMV video
    D V D  c93             Interplay C93
    D V D  camstudio       CamStudio
    D V D  camtasia        TechSmith Screen Capture Codec
    D V D  cavs            Chinese AVS video (AVS1-P2, JiZhun profile)
    D V D  cdgraphics      CD Graphics video
    D V D  cinepak         Cinepak
    DEV D  cljr            Cirrus Logic AccuPak
    D A D  cook            COOK
    D V D  cyuv            Creative YUV (CYUV)
    DEA D  dca             DCA (DTS Coherent Acoustics)
    D V D  dfa             Chronomaster DFA
    D V    dirac           BBC Dirac VC-2
    DEV D  dnxhd           VC3/DNxHD
    DEV    dpx             DPX image
    D A D  dsicinaudio     Delphine Software International CIN audio
    D V D  dsicinvideo     Delphine Software International CIN video
    DES    dvbsub          DVB subtitles
    DES    dvdsub          DVD subtitles
    DEV D  dvvideo         DV (Digital Video)
    D V D  dxa             Feeble Files/ScummVM DXA
    D V D  dxtory          Dxtory
    DEA D  eac3            ATSC A/52 E-AC-3
    D V D  eacmv           Electronic Arts CMV video
    D V D  eamad           Electronic Arts Madcow Video
    D V D  eatgq           Electronic Arts TGQ video
    D V    eatgv           Electronic Arts TGV video
    D V D  eatqi           Electronic Arts TQI Video
    D V D  escape124       Escape 124
    D V D  escape130       Escape 130
    DEV D  ffv1            FFmpeg video codec #1
    DEVSD  ffvhuff         Huffyuv FFmpeg variant
    DEA D  flac            FLAC (Free Lossless Audio Codec)
    DEV D  flashsv         Flash Screen Video
    DEV D  flashsv2        Flash Screen Video Version 2
    D V D  flic            Autodesk Animator Flic video
    DEVSD  flv             Flash Video (FLV) / Sorenson Spark / Sorenson H.263
    D V D  fraps           Fraps
    D V D  frwu            Forward Uncompressed
    DEA D  g722            G.722 ADPCM
    DEA    g723_1          G.723.1
    DEA D  g726            G.726 ADPCM
    D A D  g729            G.729
    DEV D  gif             GIF (Graphics Interchange Format)
    D A D  gsm             GSM
    D A D  gsm_ms          GSM Microsoft variant
    DEV D  h261            H.261
    DEVSDT h263            H.263 / H.263-1996
    D VSD  h263i           Intel H.263
     EV    h263p           H.263+ / H.263-1998 / H.263 version 2
    D V D  h264            H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
    D V D  h264_vdpau      H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 (VDPAU acceleration)
    DEVSD  huffyuv         Huffyuv / HuffYUV
    D V D  idcinvideo      id Quake II CIN video
    D V D  idf             iCEDraw text
    D V D  iff_byterun1    IFF ByteRun1
    D V D  iff_ilbm        IFF ILBM
    D A D  imc             IMC (Intel Music Coder)
    D V D  indeo2          Intel Indeo 2
    D V    indeo3          Intel Indeo 3
    D V    indeo4          Intel Indeo Video Interactive 4
    D V    indeo5          Intel Indeo Video Interactive 5
    D A D  interplay_dpcm  DPCM Interplay
    D V D  interplayvideo  Interplay MVE video
    DEV    j2k             JPEG 2000
    DEV D  jpegls          JPEG-LS
    D V D  jv              Bitmap Brothers JV video
    D V    kgv1            Kega Game Video
    D V D  kmvc            Karl Morton&#39;s video codec
    D V D  lagarith        Lagarith lossless
    DEA D  libgsm          libgsm GSM
    DEA D  libgsm_ms       libgsm GSM Microsoft variant
     EA    libmp3lame      libmp3lame MP3 (MPEG audio layer 3)
    DEA D  libopencore_amrnb OpenCORE Adaptive Multi-Rate (AMR) Narrow-Band
    D A D  libopencore_amrwb OpenCORE Adaptive Multi-Rate (AMR) Wide-Band
    DEV D  libopenjpeg     OpenJPEG based JPEG 2000 encoder
    DEV    libschroedinger libschroedinger Dirac 2.2
    DEA D  libspeex        libspeex Speex
     EV    libtheora       libtheora Theora
     EA    libvorbis       libvorbis Vorbis
    DEV    libvpx          libvpx VP8
     EV    libx264         libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10
     EV    libx264rgb      libx264 H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 RGB
     EV    ljpeg           Lossless JPEG
    D V D  loco            LOCO
    D A D  mace3           MACE (Macintosh Audio Compression/Expansion) 3:1
    D A D  mace6           MACE (Macintosh Audio Compression/Expansion) 6:1
    D V D  mdec            Sony PlayStation MDEC (Motion DECoder)
    D V D  mimic           Mimic
    DEV D  mjpeg           MJPEG (Motion JPEG)
    D V D  mjpegb          Apple MJPEG-B
    D A D  mlp             MLP (Meridian Lossless Packing)
    D V D  mmvideo         American Laser Games MM Video
    D V D  motionpixels    Motion Pixels video
    D A D  mp1             MP1 (MPEG audio layer 1)
    D A D  mp1float        MP1 (MPEG audio layer 1)
    DEA D  mp2             MP2 (MPEG audio layer 2)
    D A D  mp2float        MP2 (MPEG audio layer 2)
    D A D  mp3             MP3 (MPEG audio layer 3)
    D A D  mp3adu          ADU (Application Data Unit) MP3 (MPEG audio layer 3)
    D A D  mp3adufloat     ADU (Application Data Unit) MP3 (MPEG audio layer 3)
    D A D  mp3float        MP3 (MPEG audio layer 3)
    D A D  mp3on4          MP3onMP4
    D A D  mp3on4float     MP3onMP4
    D A D  mpc7            Musepack SV7
    D A D  mpc8            Musepack SV8
    DEVSDT mpeg1video      MPEG-1 video
    D V DT mpeg1video_vdpau MPEG-1 video (VDPAU acceleration)
    DEVSDT mpeg2video      MPEG-2 video
    DEVSDT mpeg4           MPEG-4 part 2
    D V DT mpeg4_vdpau     MPEG-4 part 2 (VDPAU)
    D VSDT mpegvideo       MPEG-1 video
    D V DT mpegvideo_vdpau MPEG-1/2 video (VDPAU acceleration)
    D VSDT mpegvideo_xvmc  MPEG-1/2 video XvMC (X-Video Motion Compensation)
    DEVSD  msmpeg4         MPEG-4 part 2 Microsoft variant version 3
    D VSD  msmpeg4v1       MPEG-4 part 2 Microsoft variant version 1
    DEVSD  msmpeg4v2       MPEG-4 part 2 Microsoft variant version 2
    D V D  msrle           Microsoft RLE
    DEV D  msvideo1        Microsoft Video-1
    D V D  mszh            LCL (LossLess Codec Library) MSZH
    D V D  mxpeg           Mobotix MxPEG video
    DEA D  nellymoser      Nellymoser Asao
    D V D  nuv             NuppelVideo/RTJPEG
    DEV D  pam             PAM (Portable AnyMap) image
    DEV D  pbm             PBM (Portable BitMap) image
    DEA D  pcm_alaw        PCM A-law
    D A D  pcm_bluray      PCM signed 16|20|24-bit big-endian for Blu-ray media
    D A D  pcm_dvd         PCM signed 20|24-bit big-endian
    DEA D  pcm_f32be       PCM 32-bit floating point big-endian
    DEA D  pcm_f32le       PCM 32-bit floating point little-endian
    DEA D  pcm_f64be       PCM 64-bit floating point big-endian
    DEA D  pcm_f64le       PCM 64-bit floating point little-endian
    D A D  pcm_lxf         PCM signed 20-bit little-endian planar
    DEA D  pcm_mulaw       PCM mu-law
    DEA D  pcm_s16be       PCM signed 16-bit big-endian
    DEA D  pcm_s16le       PCM signed 16-bit little-endian
    D A D  pcm_s16le_planar PCM 16-bit little-endian planar
    DEA D  pcm_s24be       PCM signed 24-bit big-endian
    DEA D  pcm_s24daud     PCM D-Cinema audio signed 24-bit
    DEA D  pcm_s24le       PCM signed 24-bit little-endian
    DEA D  pcm_s32be       PCM signed 32-bit big-endian
    DEA D  pcm_s32le       PCM signed 32-bit little-endian
    DEA D  pcm_s8          PCM signed 8-bit
    D A D  pcm_s8_planar   PCM signed 8-bit planar
    DEA D  pcm_u16be       PCM unsigned 16-bit big-endian
    DEA D  pcm_u16le       PCM unsigned 16-bit little-endian
    DEA D  pcm_u24be       PCM unsigned 24-bit big-endian
    DEA D  pcm_u24le       PCM unsigned 24-bit little-endian
    DEA D  pcm_u32be       PCM unsigned 32-bit big-endian
    DEA D  pcm_u32le       PCM unsigned 32-bit little-endian
    DEA D  pcm_u8          PCM unsigned 8-bit
    D A D  pcm_zork        PCM Zork
    DEV D  pcx             PC Paintbrush PCX image
    DEV D  pgm             PGM (Portable GrayMap) image
    DEV D  pgmyuv          PGMYUV (Portable GrayMap YUV) image
    D S    pgssub          HDMV Presentation Graphic Stream subtitles
    D V D  pictor          Pictor/PC Paint
    DEV D  png             PNG image
    DEV D  ppm             PPM (Portable PixelMap) image
    DEV D  prores          Apple ProRes
    D V D  prores_lgpl     Apple ProRes (iCodec Pro)
    D V D  ptx             V.Flash PTX image
    D A D  qcelp           QCELP / PureVoice
    D A D  qdm2            QDesign Music Codec 2
    D V D  qdraw           Apple QuickDraw
    D V D  qpeg            Q-team QPEG
    DEV D  qtrle           QuickTime Animation (RLE) video
    DEV D  r10k            AJA Kona 10-bit RGB Codec
    DEV D  r210            Uncompressed RGB 10-bit
    DEV    rawvideo        raw video
    DEA D  real_144        RealAudio 1.0 (14.4K) encoder
    D A D  real_288        RealAudio 2.0 (28.8K)
    D V D  rl2             RL2 video
    DEA D  roq_dpcm        id RoQ DPCM
    DEV D  roqvideo        id RoQ video
    D V D  rpza            QuickTime video (RPZA)
    DEV D  rv10            RealVideo 1.0
    DEV D  rv20            RealVideo 2.0
    D V D  rv30            RealVideo 3.0
    D V D  rv40            RealVideo 4.0
    D A D  s302m           SMPTE 302M
    DEV    sgi             SGI image
    D A D  shorten         Shorten
    D A D  sipr            RealAudio SIPR / ACELP.NET
    D A D  smackaud        Smacker audio
    D V D  smackvid        Smacker video
    D V D  smc             QuickTime Graphics (SMC)
    DEV D  snow            Snow
    D A D  sol_dpcm        DPCM Sol
    DEA D  sonic           Sonic
     EA    sonicls         Sonic lossless
    D V D  sp5x            Sunplus JPEG (SP5X)
    DES    srt             SubRip subtitle
    D V D  sunrast         Sun Rasterfile image
    DEV D  svq1            Sorenson Vector Quantizer 1 / Sorenson Video 1 / SVQ1
    D VSD  svq3            Sorenson Vector Quantizer 3 / Sorenson Video 3 / SVQ3
    DEV D  targa           Truevision Targa image
    D VSD  theora          Theora
    D V D  thp             Nintendo Gamecube THP video
    D V D  tiertexseqvideo Tiertex Limited SEQ video
    DEV D  tiff            TIFF image
    D V D  tmv             8088flex TMV
    D A D  truehd          TrueHD
    D V D  truemotion1     Duck TrueMotion 1.0
    D V D  truemotion2     Duck TrueMotion 2.0
    D A D  truespeech      DSP Group TrueSpeech
    D A D  tta             True Audio (TTA)
    D A D  twinvq          VQF TwinVQ
    D V D  txd             Renderware TXD (TeXture Dictionary) image
    D V D  ultimotion      IBM UltiMotion
    D V D  utvideo         Ut Video
    DEV D  v210            Uncompressed 4:2:2 10-bit
    D V D  v210x           Uncompressed 4:2:2 10-bit
    DEV D  v308            Uncompressed packed 4:4:4
    DEV D  v410            Uncompressed 4:4:4 10-bit
    D V    vb              Beam Software VB
    D V D  vble            VBLE Lossless Codec
    D V D  vc1             SMPTE VC-1
    D V D  vc1_vdpau       SMPTE VC-1 VDPAU
    D V D  vc1image        Windows Media Video 9 Image v2
    D V D  vcr1            ATI VCR1
    D A D  vmdaudio        Sierra VMD audio
    D V D  vmdvideo        Sierra VMD video
    D V D  vmnc            VMware Screen Codec / VMware Video
    DEA D  vorbis          Vorbis
    D VSD  vp3             On2 VP3
    D V D  vp5             On2 VP5
    D V D  vp6             On2 VP6
    D V D  vp6a            On2 VP6 (Flash version, with alpha channel)
    D V D  vp6f            On2 VP6 (Flash version)
    D V D  vp8             On2 VP8
    D V D  vqavideo        Westwood Studios VQA (Vector Quantized Animation) video
    D A D  wavesynth       Wave synthesis pseudo-codec
    D A D  wavpack         WavPack
    D A    wmalossless     Windows Media Audio 9 Lossless
    D A D  wmapro          Windows Media Audio 9 Professional
    DEA D  wmav1           Windows Media Audio 1
    DEA D  wmav2           Windows Media Audio 2
    D A D  wmavoice        Windows Media Audio Voice
    DEVSD  wmv1            Windows Media Video 7
    DEVSD  wmv2            Windows Media Video 8
    D V D  wmv3            Windows Media Video 9
    D V D  wmv3_vdpau      Windows Media Video 9 VDPAU
    D V D  wmv3image       Windows Media Video 9 Image
    D V D  wnv1            Winnov WNV1
    D A D  ws_snd1         Westwood Audio (SND1)
    D A D  xan_dpcm        DPCM Xan
    D V D  xan_wc3         Wing Commander III / Xan
    D V D  xan_wc4         Wing Commander IV / Xxan
    D V D  xbin            eXtended BINary text
    D V D  xl              Miro VideoXL
    DES    xsub            DivX subtitles (XSUB)
    DEV D  xwd             XWD (X Window Dump) image
    DEV D  y41p            Uncompressed YUV 4:1:1 12-bit
    D V    yop             Psygnosis YOP Video
    DEV D  yuv4            Uncompressed packed 4:2:0
    DEV D  zlib            LCL (LossLess Codec Library) ZLIB
    DEV D  zmbv            Zip Motion Blocks Video

    Library we use to convert :

    public function getAvailableAudioCodecs()
       {
           return array(&#39;libvo_aacenc&#39;, &#39;libfaac&#39;, &#39;libmp3lame&#39;);
       }

    By default I use 'libmp3lame' now because 'libfaac' is not supported by ffmpeg
    and when Im trying to encode sound by libfaac I'm getting that codec not found

    Thanks in advance !