
Recherche avancée
Autres articles (70)
-
Mise à jour de la version 0.1 vers 0.2
24 juin 2013, parExplications des différents changements notables lors du passage de la version 0.1 de MediaSPIP à la version 0.3. Quelles sont les nouveautés
Au niveau des dépendances logicielles Utilisation des dernières versions de FFMpeg (>= v1.2.1) ; Installation des dépendances pour Smush ; Installation de MediaInfo et FFprobe pour la récupération des métadonnées ; On n’utilise plus ffmpeg2theora ; On n’installe plus flvtool2 au profit de flvtool++ ; On n’installe plus ffmpeg-php qui n’est plus maintenu au (...) -
Personnaliser en ajoutant son logo, sa bannière ou son image de fond
5 septembre 2013, parCertains thèmes prennent en compte trois éléments de personnalisation : l’ajout d’un logo ; l’ajout d’une bannière l’ajout d’une image de fond ;
-
Ecrire une actualité
21 juin 2013, parPrésentez les changements dans votre MédiaSPIP ou les actualités de vos projets sur votre MédiaSPIP grâce à la rubrique actualités.
Dans le thème par défaut spipeo de MédiaSPIP, les actualités sont affichées en bas de la page principale sous les éditoriaux.
Vous pouvez personnaliser le formulaire de création d’une actualité.
Formulaire de création d’une actualité Dans le cas d’un document de type actualité, les champs proposés par défaut sont : Date de publication ( personnaliser la date de publication ) (...)
Sur d’autres sites (9012)
-
The 11th Hour RoQ Variation
12 avril 2012, par Multimedia Mike — Game Hacking, dreamroq, Reverse Engineering, roq, Vector QuantizationI have been looking at the RoQ file format almost as long as I have been doing practical multimedia hacking. However, I have never figured out how the RoQ format works on The 11th Hour, which was the game for which the RoQ format was initially developed. When I procured the game years ago, I remember finding what appeared to be RoQ files and shoving them through the open source decoders but not getting the right images out.
I decided to dust off that old copy of The 11th Hour and have another go at it.
Baseline
The game consists of 4 CD-ROMs. Each disc has a media/ directory that has a series of files bearing the extension .gjd, likely the initials of one Graeme J. Devine. These are resource files which are merely headerless concatenations of other files. Thus, at first glance, one file might appear to be a single RoQ file. So that’s the source of some of the difficulty : Sending an apparent RoQ .gjd file through a RoQ player will often cause the program to complain when it encounters the header of another RoQ file.I have uploaded some samples to the usual place.
However, even the frames that a player can decode (before encountering a file boundary within the resource file) look wrong.
Investigating Codebooks Using dreamroq
I wrote dreamroq last year– an independent RoQ playback library targeted towards embedded systems. I aimed it at a gjd file and quickly hit a codebook error.RoQ is a vector quantizer video codec that maintains a codebook of 256 2×2 pixel vectors. In the Quake III and later RoQ files, these are transported using a YUV 4:2:0 colorspace– 4 Y samples, a U sample, and a V sample to represent 4 pixels. This totals 6 bytes per vector. A RoQ codebook chunk contains a field that indicates the number of 2×2 vectors as well as the number of 4×4 vectors. The latter vectors are each comprised of 4 2×2 vectors.
Thus, the total size of a codebook chunk ought to be (# of 2×2 vectors) * 6 + (# of 4×4 vectors) * 4.
However, this is not the case with The 11th Hour RoQ files.
Longer Codebooks And Mystery Colorspace
Juggling the numbers for a few of the codebook chunks, I empirically determined that the 2×2 vectors are represented by 10 bytes instead of 6. Now I need to determine what exactly these 10 bytes represent.I should note that I suspect that everything else about these files lines up with successive generations of the format. For example if a file has 640×320 resolution, that amounts to 40×20 macroblocks. dreamroq iterates through 40×20 8×8 blocks and precisely exhausts the VQ bitstream. So that all looks valid. I’m just puzzled on the codebook format.
Here is an example codebook dump :
ID 0x1002, len = 0x0000014C, args = 0x1C0D 0 : 00 00 00 00 00 00 00 00 80 80 1 : 08 07 00 00 1F 5B 00 00 7E 81 2 : 00 00 15 0F 00 00 40 3B 7F 84 3 : 00 00 00 00 3A 5F 18 13 7E 84 4 : 00 00 00 00 3B 63 1B 17 7E 85 5 : 18 13 00 00 3C 63 00 00 7E 88 6 : 00 00 00 00 00 00 59 3B 7F 81 7 : 00 00 56 23 00 00 61 2B 80 80 8 : 00 00 2F 13 00 00 79 63 81 83 9 : 00 00 00 00 5E 3F AC 9B 7E 81 10 : 1B 17 00 00 B6 EF 77 AB 7E 85 11 : 2E 43 00 00 C1 F7 75 AF 7D 88 12 : 6A AB 28 5F B6 B3 8C B3 80 8A 13 : 86 BF 0A 03 D5 FF 3A 5F 7C 8C 14 : 00 00 9E 6B AB 97 F5 EF 7F 80 15 : 86 73 C8 CB B6 B7 B7 B7 85 8B 16 : 31 17 84 6B E7 EF FF FF 7E 81 17 : 79 AF 3B 5F FC FF E2 FF 7D 87 18 : DC FF AE EF B3 B3 B8 B3 85 8B 19 : EF FF F5 FF BA B7 B6 B7 88 8B 20 : F8 FF F7 FF B3 B7 B7 B7 88 8B 21 : FB FF FB FF B8 B3 B4 B3 85 88 22 : F7 FF F7 FF B7 B7 B9 B7 87 8B 23 : FD FF FE FF B9 B7 BB B7 85 8A 24 : E4 FF B7 EF FF FF FF FF 7F 83 25 : FF FF AC EB FF FF FC FF 7F 83 26 : CC C7 F7 FF FF FF FF FF 7F 81 27 : FF FF FE FF FF FF FF FF 80 80
Note that 0x14C (the chunk size) = 332, 0x1C and 0x0D (the chunk arguments — count of 2×2 and 4×4 vectors, respectively) are 28 and 13. 28 * 10 + 13 * 4 = 332, so the numbers check out.
Do you see any patterns in the codebook ? Here are some things I tried :
- Treating the last 2 bytes as U & V and treating the first 4 as the 4 Y samples :
- Treating the last 2 bytes as U & V and treating the first 8 as 4 16-bit little-endian Y samples :
- Disregarding the final 2 bytes and treating the first 8 bytes as 4 RGB565 pixels (both little- and big-endian, respectively, shown here) :
- Based on the type of data I’m seeing in these movies (which appears to be intended as overlays), I figured that some of these bits might indicate transparency ; here is 15-bit big-endian RGB which disregards the top bit of each pixel :
These images are taken from the uploaded sample bdpuz.gjd, apparently a component of the puzzle represented in this screenshot.
Unseen Types
It has long been rumored that early RoQ files could contain JPEG images. I finally found one such specimen. One of the files bundled early in the uploaded fhpuz.gjd sample contains a JPEG frame. It’s a standard JFIF file and can easily be decoded after separating the bytes from the resource using ‘dd’. JPEGs serve as intraframes in the coding scheme, with successive RoQ frames moving objects on top.However, a new chunk type showed up as well, one identified by 0×1030. I have never encountered this type. Where could I possibly find data about this ? Fortunately, iD Games recently posted all of their open sourced games at Github. Reading through the code for their official RoQ decoder, I see that this is called a RoQ_PACKET. The name and the code behind it are both supremely unhelpful. The code is basically a no-op. The payloads of the various RoQ_PACKETs from one sample are observed to be either 8784, 14752, or 14760 bytes in length. It’s very likely that this serves the same purpose as the JPEG intraframes.
Other Tidbits
I read through the readme.txt on the first game disc and found this nugget :g) Animations displayed normally or in SPOOKY MODE
SPOOKY MODE is blue-tinted grayscale with color cursors, puzzle
and game pieces. It is the preferred display setting of the
developers at Trilobyte. Just for fun, try out the SPOOKY
MODE.The MobyGames screenshot page has a number of screenshots labeled as being captured in spooky mode. Color tricks ?
Meanwhile, another twist arose as I kept tweaking dreamroq to deal with more RoQ weirdness : After modifying my dreamroq code to handle these 10-byte vectors, it eventually chokes on another codebook. These codebooks happen to have 6-byte vectors again ! Fortunately, I was already working on a scheme to automatically detect which codebook is in play (plugging the numbers into a formula and seeing which vector size checks out).
- Treating the last 2 bytes as U & V and treating the first 4 as the 4 Y samples :
-
Playing Video on a Sega Dreamcast
9 mars 2011, par Multimedia Mike — Sega DreamcastHere’s an honest engineering question : If you were tasked to make compressed video play back on a Sega Dreamcast video game console, what video format would you choose ? Personally, I would choose RoQ, the format invented for The 11th Hour computer game and later used in Quake III and other games derived from the same engine. This post explains my reasoning.
Video Background
One of the things I wanted to do when I procured a used Sega Dreamcast back in 2001 was turn it into a set-top video playback unit. This is something that a lot of people tried to do, apparently, to varying degrees of success. Interest would wane in a few years as it became easier and easier to crack an Xbox and install XBMC. The Xbox was much better suited to playing codecs that were getting big at the time, most notably MPEG-4 part 2 video (DivX/XviD).The Dreamcast, while quite capable when it was released in 1999, was not very well-equipped to deal with an MPEG-type codec. I have recently learned that there are other hackers out there on the internet who are still trying to get the most out of this system. I was contacted for advice about how to make Theora perform better on the Dreamcast.
Interesting thing about consoles and codecs : Since you are necessarily distributing code along with your data, you have far more freedom to use whatever codecs you want for your audio and video data. This is why Vorbis and even Theora have seen quite a bit of use in video games, "internet standards" be darned. Thus, when I realized this application had no hard and fast requirement to use Theora, and that it could use any codec that fit the platform, my mind started churning. When I was programming the DC 10 years ago, I didn’t have access to the same wealth of multimedia knowledge that is currently available.Requirements Gathering
What do we need here ?- Codec needs to run on the Sega Dreamcast ; this eliminates codecs for which only binary decoder implementations are available
- Must decode 320x240 video at 30 fps ; higher resolutions up to 640x480 would be desirable
- Must deliver decent quality at 12X optical read speeds (DC drive speed)
- There must be some decent, preferably free, encoder readily available ; speed of encoding, however, is not important ; i.e., "take as long as you need, encoder"
Theora was the go-to codec because it’s just commonly known as "the free, open source video codec". But clearly it’s not suitable for, well... any purpose, really (sorry, easy target ; OW ! stop throwing things !). VP8/WebM — Theora’s heir apparent — would not qualify either, as my prior experiments have already demonstrated.
Candidates
What did the big boys use for video on the Dreamcast ? A lot of games relied on CRI’s Sofdec middleware which was MPEG-1 video and a custom ADPCM format. I don’t know if I have ever seen DC games that used MPEG-1 video at a higher resolution than 320x240 (though I have not searched exhaustively). The fact that CRI used a custom ADPCM format for this application may indicate that there wasn’t enough CPU power left over to decode a perceptual, transform-based audio codec alongside the 320x240 video.A few other DC games used 4X Technologies’ 4XM format. The most notable licensee was Alone in the Dark : The New Nightmare (DC version only ; PC version used Bink). This codec was DCT-based but incorporated 16-bit RGB colorspace into its design, presumably to optimize for applications like game consoles that couldn’t directly handle planar YUV. AITD:TNN’s videos were 640x360, a marked improvement over the typical Sofdec fare. I was about to write off 4XM as a contender due to lack of encoder, but the encoding tools are preserved on our samples site. A few other issues, though : The FFmpeg decoder doesn’t seem to work correctly as of this writing (and nobody has noticed yet, even though it’s tested via FATE).
What ideas do I have ? Right off the bat, I’m thinking vector quantizer (VQ). Vector quantizers are notoriously slow to compress but are blazingly fast to decompress which is why they were popular in the early days of video compression. First, there’s Cinepak. I fear that might be too simple for this application. Plus, I don’t know if existing (binary-only) compressors are very decent. It seems that they only ever had to handle small videos and I’ve heard that they can really fall over if anything more is demanded of them.
Sorenson Video 1 is another contender. FFmpeg has an encoder (which some allege is better than Sorenson’s original compressor). However, I fear that the wonky algorithm and colorspace might not mesh well with the Dreamcast.
My thinking quickly converged on RoQ. This was designed to run fullscreen (640x480) video on i486-class hardware. While RoQ fundamentally operates in a YUV colorspace, it’s trivial to convert it to any other colorspace during decoding and the image will be rendered in that colorspace. Plus, there are open source encoders available for the format (namely, several versions of Eric Lasota’s Switchblade encoder, one of which lives natively in FFmpeg), as well as the original proprietary encoder.
Which Library ?
There are several code choices here : FFmpeg (LGPL), Switchblade (GPL), and the original Quake 3 source code (GPL). There is one more option that I think might be easiest, which is the decoder Dr. Tim created when he reverse engineered the format in the first place. That has a very liberal "do whatever you like, but be nice and give me credit" license (probably qualifies as BSD).This code is no longer at its original home but the Wayback Machine still had a copy, which I have now mirrored (idroq.tar.gz).
Adaptation
Dr. Tim’s code still compiles and runs great on Linux (64-bit !) with SDL output. I would like to get it ported to the Dreamcast using the same SDL output, which KallistiOS supports. Then, there is the matter of fixing the longstanding chroma bug in the original sample decoder (described here). The decoder also needs to be modified to natively render RGB565 data, as that will work best with the DC’s graphics hardware.After making the code work, I want to profile it and test whether it can handle full-frame 640x480 playback at 30 frames/second. I will need to contrive a sample to achieve this.
Unfortunately, things went off the rails pretty quickly when I tried to get the RoQ decoder ported to DC/KOS. It looks like there’s a bug in KallistiOS’s minimalistic standard C library, or at least a discrepancy with my desktop Linux system. When you read to the end of a file and then seek backwards to someplace that isn’t the end, is the file still in EOF state ?
According to my Linux desktop :
open file ; feof() = 0 seek to end ; feof() = 0 read one more byte ; feof() = 1 seek back to start ; feof() = 0
According to KallistiOS :
open file ; feof() = 0 seek to end ; feof() = 0 read one more byte ; feof() = 1 seek back to start ; feof() = 1
Here’s the seek-test.c program I used to test this issue :
C :-
#include <stdio .h>
-
-
int main()
-
{
-
FILE *f ;
-
unsigned char byte ;
-
-
f = fopen("seek_test.c", "r") ;
-
fseek(f, 0, SEEK_END) ;
-
fread(&byte, 1, 1, f) ;
-
fseek(f, 0, SEEK_SET) ;
-
fclose(f) ;
-
-
return 0 ;
-
}
EOF
Speaking of EOF, I’m about done for this evening.What codec would you select for this task, given the requirements involved ?
-
Anatomy of an optimization : H.264 deblocking
As mentioned in the previous post, H.264 has an adaptive deblocking filter. But what exactly does that mean — and more importantly, what does it mean for performance ? And how can we make it as fast as possible ? In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.
H.264′s deblocking filter has two steps : strength calculation and the actual filter. The first step calculates the parameters for the second step. The filter runs on all the edges in each macroblock. That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels. The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters !). The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.
Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.
If we’re on the edge between an intra macroblock and any other macroblock : Strength 4
If we’re on an internal edge of an intra macroblock : Strength 3
If either side of a 4-pixel-long edge has residual data : Strength 2
If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same : Strength 1
Otherwise : Strength 0 (no deblocking)These values are then thrown into a lookup table depending on the quantizer : higher quantizers have stronger deblocking. Then the actual filter is run with the appropriate parameters. Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.
One can see somewhat intuitively why these strengths are chosen. The deblocker exists to get rid of sharp edges caused by the block-based nature of H.264, and so the strength depends on what exists that might cause such sharp edges. The strength calculation is a way to use existing data from the video stream to make better decisions during the deblocking process, improving compression and quality.
Both the strength calculation and the actual filter (not described here) are very complex if naively implemented. The latter can be SIMD’d with not too much difficulty ; no H.264 decoder can get away with reasonable performance without such a thing. But what about optimizing the strength calculation ? A quick analysis shows that this can be beneficial as well.
Since we have to check both horizontal and vertical edges, we have to check up to 32 pairs of coefficient counts (for residual), 16 pairs of reference frame indices, and 128 motion vector values (counting x and y as separate values). This is a lot of calculation ; a naive implementation can take 500-1000 clock cycles on a modern CPU. Of course, there’s a lot of shortcuts we can take. Here’s some examples :
- If the macroblock uses the 8×8 transform, we only need to check 2 edges in each direction instead of 4, because we don’t deblock inside of the 8×8 blocks.
- If the macroblock is a P-skip, we only have to check the first edge in each direction, since there’s guaranteed to be no motion vector differences, reference frame differences, or residual inside of the macroblock.
- If the macroblock has no residual at all, we can skip that check.
- If we know the partition type of the macroblock, we can do motion vector checks only along the edges of the partitions.
- If the effective quantizer is so low that no deblocking would be performed no matter what, don’t bother calculating the strength.
But even all of this doesn’t save us from ourselves. We still have to iterate over a ton of edges, checking each one. Stuff like the partition-checking logic greatly complicates the code and adds overhead even as it reduces the number of checks. And in many cases decoupling the checks to add such logic will make it slower : if the checks are coupled, we can avoid doing a motion vector check if there’s residual, since Strength 2 overrides Strength 1.
But wait. What if we could do this in SIMD, just like the actual loopfilter itself ? Sure, it seems more of a problem for C code than assembly, but there aren’t any obvious things in the way. Many years ago, Loren Merritt (pengvado) wrote the first SIMD implementation that I know of (for ffmpeg’s decoder) ; it is quite fast, so I decided to work on porting the idea to x264 to see if we could eke out a bit more speed here as well.
Before I go over what I had to do to make this change, let me first describe how deblocking is implemented in x264. Since the filter is a loopfilter, it acts “in loop” and must be done in both the encoder and decoder — hence why x264 has it too, not just decoders. At the end of encoding one row of macroblocks, x264 goes back and deblocks the row, then performs half-pixel interpolation for use in encoding the next frame.
We do it per-row for reasons of cache coherency : deblocking accesses a lot of pixels and a lot of code that wouldn’t otherwise be used, so it’s more efficient to do it in a single pass as opposed to deblocking each macroblock immediately after encoding. Then half-pixel interpolation can immediately re-use the resulting data.
Now to the change. First, I modified deblocking to implement a subset of the macroblock_cache_load function : spend an extra bit of effort loading the necessary data into a data structure which is much simpler to address — as an assembly implementation would need (x264_macroblock_cache_load_deblock). Then I massively cleaned up deblocking to move all of the core strength-calculation logic into a single, small function that could be converted to assembly (deblock_strength_c). Finally, I wrote the assembly functions and worked with Loren to optimize them. Here’s the result.
And the timings for the resulting assembly function on my Core i7, in cycles :
deblock_strength_c : 309
deblock_strength_mmx : 79
deblock_strength_sse2 : 37
deblock_strength_ssse3 : 33Now that is a seriously nice improvement. 33 cycles on average to perform that many comparisons–that’s absurdly low, especially considering the SIMD takes no branchy shortcuts : it always checks every single edge ! I walked over to my performance chart and happily crossed off a box.
But I had a hunch that I could do better. Remember, as mentioned earlier, we’re reloading all that data back into our data structures in order to address it. This isn’t that slow, but takes enough time to significantly cut down on the gain of the assembly code. And worse, less than a row ago, all this data was in the correct place to be used (when we just finished encoding the macroblock) ! But if we did the deblocking right after encoding each macroblock, the cache issues would make it too slow to be worth it (yes, I tested this). So I went back to other things, a bit annoyed that I couldn’t get the full benefit of the changes.
Then, yesterday, I was talking with Pascal, a former Xvid dev and current video hacker over at Google, about various possible x264 optimizations. He had seen my deblocking changes and we discussed that a bit as well. Then two lines hit me like a pile of bricks :
<_skal_> tried computing the strength at least ?
<_skal_> while it’s freshWhy hadn’t I thought of that ? Do the strength calculation immediately after encoding each macroblock, save the result, and then go pick it up later for the main deblocking filter. Then we can use the data right there and then for strength calculation, but we don’t have to do the whole deblock process until later.
I went and implemented it and, after working my way through a horde of bugs, eventually got a working implementation. A big catch was that of slices : deblocking normally acts between slices even though normal encoding does not, so I had to perform extra munging to get that to work. By midday today I was able to go cross yet another box off on the performance chart. And now it’s committed.
Sometimes chatting for 10 minutes with another developer is enough to spot the idea that your brain somehow managed to miss for nearly a straight week.
NB : the performance chart is on a specific test clip at a specific set of settings (super fast settings) relevant to the company I work at, so it isn’t accurate nor complete for, say, default settings.
Update : Here’s a higher resolution version of the current chart, as requested in the comments.