Breaking Eggs And Making Omelettes
A blog dealing with technical multimedia matters, binary reverse engineering, and the occasional video game hacking.
Les articles publiés sur le site
-
Gallery of VP8 Encoding Naivete
15 octobre 2010, par Multimedia Mike — VP8I’ve been toiling away as a multimedia technology generalist for so long that it’s easy for me to forget that not everyone is as versed in the minutiae of the domain as I am. But I recently experienced what it’s like to be such an outsider when I posted about my toy VP8 encoder, expressing that it’s one of the hardest things I have ever tried to do. I heard of from number of people who do have extensive experience in video encoding, particularly with the H.264 and VP8 codecs. Their reactions were predictable: What’s so hard? Look, you might be a little too immersed in the area to really understand a relative beginner’s perspective.
And to all the people who suggested that I should get the encoder into FFmpeg ASAP: Are you crazy?! Did you see what the first pass of the encoder produced? Do you have lower standards than even I do?
Not Giving Up
I worked a little more on the toy encoder. Remember that the above image is what I’m hoping to encode somewhat faithfully for this experiment. In my first pass, I attempted vertical prediction for all planes. For my next pass, I forced the chroma planes to mid-level (which results in a greyscale image) and played with the 16×16 luma prediction modes. When implementing an extremely naive algorithm to decide which 16×16 prediction mode would be the best for a particular block, this is what the program produced:
For fun, here is what the image encodes to when forcing various prediction modes:
I think the DC-only prediction mode actually looks a little better than the image that the naive algorithm produced:
Vertical 16×16 prediction, similar to the image from the last post (just in black and white):
Horizontal 16×16 prediction:
This is the 16×16 prediction mode unique to VP8, the TrueMotion mode (based on On2/Duck’s very first video codec):
Wow, these encodings really bring down the cheerful tone of the original image.
Next Steps
I have little reason to believe that I am encoding and subsequently reconstructing the image correctly (i.e., error is likely propagating through the entire encoding). If I have time, the next step is to validate my reconstruction against the encoder. Then I need to get the entropy considerations correct so that I actually get some compression out of this format. -
WebM Cabal
I traveled to a secret clubhouse today to take part in a clandestine meeting to discuss exactly how WebM will rule over all that you see and hear on the web. I can’t really talk about it. But I can show you the cool hat I got:
Yeah, you’re jealous.
The back of the hat has an Easter egg for video codec nerds– the original Duck Corporation logo (On2′s original name):
Former employees of On2 (now Googlers) were well-represented. It was an emotional day of closure as I met the person — the only person to date — who contacted me with a legal threat so many years ago. He still remembered me too.
I met a lot of people involved in creating various Duck and On2 codecs and learned a lot of history and lore behind then– history I hope to be able to document one day.
I’m glad I got that first rough draft of a toy VP8 encoder done in time for the meeting. It was the subject of much mirth.
-
Announcing the World’s Worst VP8 Encoder
I wanted to see if I could write an extremely basic VP8 encoder. It turned out to be one of the hardest endeavors I have ever attempted (and arguably one of the least successful).
Results
I started with the Big Buck Bunny title image:
And this is the best encoding that this experiment could yield:
Squint hard enough and you can totally make out the logo. Pretty silly effort, I know. It should also be noted that the resultant .webm file holding that single 400×225 image was 191324 bytes. When FFmpeg decoded it to a PNG, it was only 187200 bytes.
The Story
Remember my post about a naive SVQ1 encoder? Long story short, I set out to do the same thing with VP8. (I wanted to the same thing with VP3/Theora for years. But take a good look at what it would entail to create even the most basic bitstream. As involved as VP8 may be, its bitstream is absolutely trivial compared to VP3/Theora.)
With the naive SVQ1 encoder, the goal was to create a minimally compliant SVQ1 encoded bitstream. For this exercise, I similarly hypothesized what it would take to create the most basic, syntactically correct VP8 bitstream with the least amount of effort. These are the overall steps I came up with:
- Intra-only
- Create a basic bitstream header that disables any extra features (no modification of default tables)
- Use a static quantizer
- Use intra 16×16 coding for each macroblock
- Use vertical prediction for the 16×16 intra coding
For coding each macroblock:
- Subtract vertical predictor from each row
- Perform forward transform on each 4×4 sub block
- Perform forward WHT on luma plane DCT coefficients
- Pack the coefficients into the bitstream via the Boolean encoder
It all sounds so simple. But, like I said in the SVQ1 post, it’s all very much like carefully bootstrapping a program to run on a particular CPU, and the VP8 decoder serves as the CPU. I’m confident that I have the bitstream encoding correct because, at the very least, the decoder agrees precisely with the encoder about the numbers represented by those 0s and 1s.
What’s Wrong?
Compromises were made for the sake of getting some vaguely recognizable image encoded in a minimally valid manner. One big stumbling block is that I couldn’t seem to encode an end of block (EOB) condition correctly. I then realized that it’s perfectly valid to just encode a lot of zero coefficients rather than signaling EOB. An encoding travesty, I know, and likely one reason that the resulting filesize is so huge.More drama occurred when I hit my first block that had all zeros. There were complications in that situation that I couldn’t seem to avoid. So I forced the first AC coefficient to be 1 in that case. Hey, the decoder liked it.
As for the generally weird look of the decoded image, I’m thinking that could either be: A) an artifact of forcing 16×16 vertical prediction or; or B) a mistake in the way that I transformed and predicted stuff before sending it to the decoder. The smart money is on a combination of both A and B.
Then again, as the SVQ1 experiment demonstrated, I shouldn’t expect extraordinary visual quality when setting the bar this low (i.e., just getting some bag of bits that doesn’t make the decoder barf).
-
On WebP and Academic Exercises
2 octobre 2010, par Multimedia Mike — GeneralYesterday, Google released a new still image format called WebP. To those skilled in the art, this new format will be recognizable as a single VP8 golden frame with a 20-byte header slapped on the front (and maybe a little metadata thrown in for good measure). We have a MultimediaWiki page and a sample ready to go.
Further, I submitted a patch to ffmpeg-devel for FFmpeg’s img2 handling system to decode these files. FFmpeg should support processing these files soon… if anyone cares. This leads into…
The Point, or Lack Thereof
Since yesterday’s release, I have read a whirlwind of commentary about this format, much of it critical and of the “what’s the point?” variety. For my part, I can respect academic exercises, a.k.a., just trying random stuff to see if you can make it work. That’s pretty much this blog’s entire raison d’être. But WebP transcends mere academic exercise; Google seems to be trying to push it as a new web standard. I don’t see how the format can go anywhere based on criticisms raised elsewhere — e.g., see Dark Shikari’s thoughtful write-up — which basically boil down to WebP not solving any real problems, technical, legal, or otherwise.How did WebP come to be? I strongly suspect some engineers noticed that JPEG is roughly the same as an MPEG-1 intraframe, so why not create a new still frame format based on VP8 intraframes? Again, I can respect that thinking– I have pondered how a still image format would perform if based on VP3/Theora or Sorenson Video 1.
Technically
Google claims a significant size savings for WebP vs. standard JPEG. Assuming that’s true (and there will be no shortage of blog posts to the contrary), it will still be some time before WebP support will find its way into the majority of the web browser population.But this got me thinking about possible interim solutions. A website could store images compressed in both formats if it so chose. Then it could serve up a WebM image if the browser could support it, as indicated by the ‘Accept’ header in the HTTP request. It seems that a website might have to reference a generic image name such as
<img src="some-picture.image">
; the web server would have to recognize the .image extension and map it to either a .jpg or a .webp image depending on what the browser claims it is capable of displaying.Leftovers
I appreciate that Dark Shikari has once again stuck his neck out and made a valiant — though often futile — effort to educate the internet’s masses. I long ago resigned myself to the fact that many people aren’t going to understand many of the most basic issues surrounding multimedia technology (i.e., moving pictures synchronized with audio). But apparently, this extends to still image formats as well. It was simultaneously humorous and disheartening to see commenters who don’t even understand the application of, e.g., PNG vs. JPEG: Ahem, “We already have a great replacement for jpg: .PNG”. Coupled with the typical accusations of MPEG tribalism, I remain impressed D. Shikari finds the will to bother.Still, I appreciate that the discussion has introduced me to some new image formats of which I was previously unaware, such as PGF and JPEG XR.
-
Dreamcast Operating Systems
16 septembre 2010, par Multimedia Mike — Sega DreamcastThe Sega Dreamcast was famously emblazoned with a logo proudly announcing that it was compatible with Windows CE:
It’s quite confusing. The console certainly doesn’t boot into some version of Windows to launch games. Apparently, there was a special version of CE developed for the DC and game companies had the option to leverage it. I do recall that some game startup screens would similarly advertise Windows CE.
Once the homebrew community got ahold of the device, the sky was the limit. I think NetBSD was the first alternative OS to support the Dreamcast. Meanwhile, I have recollections of DC Linux and LinuxDC projects along with more generic Linux-SH and SH-Linux projects.
DC Evolution hosts a disc image available for download with an unofficial version of DC Linux, assembled by one Adrian O’Grady. I figured out how to burn the disc (burning DC discs is a blog post of its own) and got it working in the console.
It’s possible to log in directly via the physical keyboard or through a serial terminal provided that you have a coder’s cable. That reminds me– my local Fry’s had a selection of USB-to-serial cables. I think this is another area that is sufficiently commoditized that just about any cable ought to work with Linux out of the box. Or maybe I’m just extrapolating from the experience of having the cheapest cable in the selection (made by io connect) plug and play with Linux.
Look! No messy converter box in the middle as in the Belkin case. The reason I went with this cable is that the packaging claimed it was capable of up to 500 Kbits/sec. Most of the cables advertised a max of 115200 bps. I distinctly recall being able to use the DC coder’s cable at 230400 bps a long time ago. Alas, 115200 seems to be the speed limit, even with this new USB cable.
Anyway, the distribution is based on a 2.4.5 kernel circa 2001. I tried to make PPP work over the serial cable but the kernel doesn’t have support. If you’re interested, here is some basic information about the machine from Linux’s perspective, gleaned from some simple commands. This helps remind us of a simpler time when Linux was able to run comfortably on a computer with 16 MB of RAM.
Debian GNU/Linux testing/unstable dreamcast ttsc/1 dreamcast login: root Linux dreamcast 2.4.5 #27 Thu May 31 07:06:51 JST 2001 sh4 unknown Most of the programs included with the Debian GNU/Linux system are freely redistributable; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. dreamcast:~# uname -a Linux dreamcast 2.4.5 #27 Thu May 31 07:06:51 JST 2001 sh4 unknown dreamcast:~# cat /proc/cpuinfo cpu family : SH-4 cache size : 8K-byte/16K-byte bogomips : 199.47 Machine: dreamcast CPU clock: 200.00MHz Bus clock: 100.00MHz Peripheral module clock: 50.00MHz dreamcast:~# top -b 09:14:54 up 14 min, 1 user, load average: 0.04, 0.03, 0.03 15 processes: 14 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 1.1% user, 5.8% system, 0.0% nice, 93.1% idle Mem: 14616K total, 11316K used, 3300K free, 2296K buffers Swap: 0K total, 0K used, 0K free, 5556K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 219 root 18 0 1072 1068 868 R 5.6 7.3 0:00 top 1 root 9 0 596 596 512 S 0.0 4.0 0:01 init 2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd 3 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd 4 root 9 0 0 0 0 SW 0.0 0.0 0:00 kreclaimd 5 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush 6 root 9 0 0 0 0 SW 0.0 0.0 0:00 kupdated 7 root 9 0 0 0 0 SW 0.0 0.0 0:00 kmapled 39 root 9 0 900 900 668 S 0.0 6.1 0:00 devfsd 91 root 8 0 652 652 556 S 0.0 4.4 0:00 pump 96 daemon 9 0 524 524 420 S 0.0 3.5 0:00 portmap 149 root 9 0 944 944 796 S 0.0 6.4 0:00 syslogd 152 root 9 0 604 604 456 S 0.0 4.1 0:00 klogd 187 root 9 0 540 540 456 S 0.0 3.6 0:00 getty 201 root 9 0 1380 1376 1112 S 0.0 9.4 0:01 bash
Note that at this point I had shutdown both gpm and inetd. The rest of the processes, save for bash, are default. The above stats only report about 14 MB of RAM; where are the other 2 MB?
dreamcast:~# df -h Filesystem Size Used Avail Use% Mounted on /dev/rd/1 2.0M 560k 1.4M 28% /