Breaking Eggs And Making Omelettes

A blog dealing with technical multimedia matters, binary reverse engineering, and the occasional video game hacking.

http://multimedia.cx/eggs/

Les articles publiés sur le site

  • Parlez-Vous Binutils ?

    18 juillet 2010, par Multimedia MikeGeneral

    I found myself in need of some binutils today. What do you suppose it is about this basic Apache file listing page that makes Google Chrome think it's in French?



    Opting to translate doesn't seem to have any affect, aside from ruining the alignment of the columns.

    That quirk aside, the page translation facility is actually quite nifty.

  • Brute Force Dimensional Analysis

    15 juillet 2010, par Multimedia MikeGame Hacking, Python

    I was poking at the data files of a really bad (is there any other kind?) interactive movie video game known simply by one letter: D. The Sega Saturn version of the game is comprised primarily of Sega FILM/CPK files, about which I wrote the book. The second most prolific file type bears the extension '.dg2'. Cursory examination of sample files revealed an apparently headerless format. Many of the video files are 288x144 in resolution. Multiplying that width by that height and then doubling it (as in, 2 bytes/pixel) yields 82944, which happens to be the size of a number of these DG2 files. Now, if only I had a tool that could take a suspected raw RGB file and convert it to a more standard image format.

    Here's the FFmpeg conversion recipe I used:

     ffmpeg -f rawvideo -pix_fmt rgb555 -s 288x144 -i raw_file -y output.png
    

    So that covers the files that are suspected to be 288x144 in dimension. But what about other file sizes? My brute force approach was to try all possible dimensions that would yield a particular file size. The Python code for performing this operation is listed at the end of this post.

    It's interesting to view the progression as the script compresses to different sizes:



    That 'D' is supposed to be red. So right away, we see that rgb555(le) is not the correct input format. Annoyingly, FFmpeg cannot handle rgb5[5|6]5be as a raw input format. But this little project worked well enough as a proof of concept.

    If you want to toy around with these files (and I know you do), I have uploaded a selection at: http://multimedia.cx/dg2/.

    Here is my quick Python script for converting one of these files to every acceptable resolution.

    work-out-resolution.py:

    PYTHON:
    1. #!/usr/bin/python
    2.  
    3. import commands
    4. import math
    5. import os
    6. import sys
    7.  
    8. FFMPEG = "/path/to/ffmpeg"
    9.  
    10. def convert_file(width, height, filename):
    11.   outfile = "%s-%dx%d.png" % (filename, width, height)
    12.   command = "%s -f rawvideo -pix_fmt rgb555 -s %dx%d -i %s -y %s" % (FFMPEG, width, height, filename, outfile)
    13.   commands.getstatusoutput(command)
    14.  
    15. if len(sys.argv) <2:
    16.   print "USAGE: work-out-resolution.py <file>"
    17.   sys.exit(1)
    18.  
    19. filename = sys.argv[1]
    20. if not os.path.exists(filename):
    21.   print filename + " does not exist"
    22.   sys.exit(1)
    23.  
    24. filesize = os.path.getsize(filename) / 2
    25.  
    26. limit = int(math.sqrt(filesize)) + 1
    27. for i in xrange(1, limit):
    28.   if filesize % i == 0 and filesize & 1 == 0:
    29.     convert_file(i, filesize / i, filename)
    30.     convert_file(filesize / i, i, filename)

  • Lego Mindstorms RSO Format

    14 juillet 2010, par Multimedia MikeGeneral

    I recently read a magazine article about Lego Mindstorms. Naturally, the item that caught my eye was the mention of a bit of Lego software that converts various audio file formats to a custom format called RSO that can be downloaded into a Mindstorms project to make the creation output audio. To read different sources, one might be left with the impression that there is something super-duper top secret proprietary about the format. Such impressions do not hold up under casual analysis of a sample file.

    A Google search for "filetype:rso" yielded a few pre-made sample that I have mirrored into the samples archive. The format appears to be an 8-byte header followed by unsigned, 8-bit PCM. More on the wiki. If FFmpeg could gain an RSO file muxer, that would presumably be a heroic feat to the Lego hacking community.

  • Multiprocess FATE Revisited

    26 juin 2010, par Multimedia MikeFATE Server, Python

    I thought I had brainstormed a simple, elegant, multithreaded, deadlock-free refactoring for FATE in a previous post. However, I sort of glossed over the test ordering logic which I had not yet prototyped. The grim, possibly deadlock-afflicted reality is that the main thread needs to be notified as tests are completed. So, the main thread sends test specs through a queue to be executed by n tester threads and those threads send results to a results aggregator thread. Additionally, the results aggregator will need to send completed test IDs back to the main thread.



    But when I step back and look at the graph, I can't rationalize why there should be a separate results aggregator thread. That was added to cut down on deadlock possibilities since the main thread and the tester threads would not be waiting for data from each other. Now that I've come to terms with the fact that the main and the testers need to exchange data in realtime, I think I can safely eliminate the result thread. Adding more threads is not the best way to guard against race conditions and deadlocks. Ask xine.



    I'm still hung up on the deadlock issue. I have these queues through which the threads communicate. At issue is the fact that they can cause a thread to block when inserting an item if the queue is "full". How full is full? Immaterial; seeking to answer such a question is not how you guard against race conditions. Rather, it seems to me that one side should be doing non-blocking queue operations.

    This is how I'm planning to revise the logic in the main thread:

    test_set = set of all tests to execute
    tests_pending = test_set
    tests_blocked = empty set
    tests_queue = multi-consumer queue to send test specs to tester threads
    results_queue = multi-producer queue through which tester threads send results
    while there are tests in tests_pending:
      pop a test from test_set
      if test depends on any tests that appear in tests_pending:
        add test to tests_blocked
      else:
        add test to tests_queue in a non-blocking manner
        if tests_queue is full, add test to tests_blocked
    
      while there are results in the results_queue:
        get a result from result_queue in non-blocking manner
        remove the corresponding test from tests_pending
    
    if tests_blocked is non-empty:
      sleep for 1 second
      test_set = tests_blocked
      tests_blocked = empty set
    else:
      insert n shutdown signals, one from each thread
    
    go to the top of the loop and repeat until there are no more tests
    
    while there are results in the results_queue:
      get a result from result_queue in a blocking manner
    

    Not mentioned in the pseudocode (so it doesn't get too verbose) is logic to check whether the retrieved test result is actually an end-of-thread signal. These are accounted and the whole test process is done when one is received for each thread.

    On the tester thread side, it's safe for them to do blocking test queue retrievals and blocking result queue insertions. The reason for the 1-second delay before resetting tests_blocked and looping again is because I want to guard against the situation where tests A and B are to be run, A depends of B running first, and while B is running (and happens to be a long encoding test), the main thread is spinning about, obsessively testing whether it's time to insert A into the tests queue.

    It all sounds just crazy enough to work. In fact, I coded it up and it does work, sort of. The queue gets blocked pretty quickly. Instead of sleeping, I decided it's better to perform the put operation using a 1-second timeout.

    Still, I'm paranoid about the precise operation of the IPC queue mechanism at work here. What happens if I try to stuff in a test spec that's a bit too large? Will the module take whatever I give it and serialize it through the queue as soon as it can? I think an impromptu science project is in order.

    big-queue.py:

    PYTHON:
    1. #!/usr/bin/python
    2.  
    3. import multiprocessing
    4. import Queue
    5.  
    6. def f(q):
    7.     str = q.get()
    8.     print "reader function got a string of %d characters" % (len(str))
    9.  
    10. q = multiprocessing.Queue()
    11. p = multiprocessing.Process(target=f, args=(q,))
    12. p.start()
    13. try:
    14.     q.put_nowait('a' * 100000000)
    15. except Queue.Full:
    16.     print "queue full"

    $ ./big-queue.py
    reader function got a string of 100000000 characters
    

    Since 100 MB doesn't even make it choke, FATE's little test specs shouldn't pose any difficulty.

  • FFmpeg Has A Native VP8 Decoder

    24 juin 2010, par Multimedia MikeVP8

    Thanks to David Conrad and Ronald Bultje who committed their native VP8 video decoder to the FFmpeg codebase yesterday. At this point, it can decode 14/17 of the VP8 test vectors that Google released during the initial open sourcing event. Work is ongoing on those 3 non-passing samples (missing bilinear filter). Meanwhile, FFmpeg's optimization-obsessive personalities are hard at work optimizing the native decoder. The current decoder is already profiled to be faster than Google/On2's official libvpx.

    Testing
    So it falls to FATE to test this on the ridiculous diversity of platforms that FFmpeg supports. I staged individual test specs for each of the 17 test vectors: vp8-test-vector-001 ... vp8-test-vector-017. After the samples have propagated through to the various FATE installations, I'll activate the 14 test specs that are currently passing.

    Initial Testing Methodology
    Inspired by Ronald Bultje's idea, I built the latest FFmpeg-SVN with libvpx enabled. Then I selected between the reference and native decoders as such:

    $ for i in 001 002 003 004 005 006 007 008 009 \
     010 011 012 013 014 015 016 017
    do
      echo vp80-00-comprehensive-${i}.ivf
      ffmpeg -vcodec libvpx -i \
        /path/to/vp8-test-vectors-r1/vp80-00-comprehensive-${i}.ivf \
        -f framemd5 - 2> /dev/null
    done > refs.txt
    
    $ for i in 001 002 003 004 005 006 007 008 009 \
     010 011 012 013 014 015 016 017
    do
      echo vp80-00-comprehensive-${i}.ivf
      ffmpeg -vcodec vp8 -i \
        /path/to/vp8-test-vectors-r1/vp80-00-comprehensive-${i}.ivf \
        -f framemd5 - 2> /dev/null
    done > native.txt
    
    $ diff -u refs.txt native.txt
    

    That reveals precisely which files differ.