Recherche avancée

Médias (1)

Mot : - Tags -/book

Autres articles (54)

  • Websites made ​​with MediaSPIP

    2 mai 2011, par

    This page lists some websites based on MediaSPIP.

  • Automated installation script of MediaSPIP

    25 avril 2011, par

    To overcome the difficulties mainly due to the installation of server side software dependencies, an "all-in-one" installation script written in bash was created to facilitate this step on a server with a compatible Linux distribution.
    You must have access to your server via SSH and a root account to use it, which will install the dependencies. Contact your provider if you do not have that.
    The documentation of the use of this installation script is available here.
    The code of this (...)

  • Demande de création d’un canal

    12 mars 2010, par

    En fonction de la configuration de la plateforme, l’utilisateur peu avoir à sa disposition deux méthodes différentes de demande de création de canal. La première est au moment de son inscription, la seconde, après son inscription en remplissant un formulaire de demande.
    Les deux manières demandent les mêmes choses fonctionnent à peu près de la même manière, le futur utilisateur doit remplir une série de champ de formulaire permettant tout d’abord aux administrateurs d’avoir des informations quant à (...)

Sur d’autres sites (5846)

  • Greed is Good ; Greed Works

    25 novembre 2010, par Multimedia Mike — VP8

    Greed, for lack of a better word, is good ; Greed works. Well, most of the time. Maybe.

    Picking Prediction Modes
    VP8 uses one of 4 prediction modes to predict a 16x16 luma block or 8x8 chroma block before processing it (for luma, a block can also be broken into 16 4x4 blocks for individual prediction using even more modes).

    So, how to pick the best predictor mode ? I had no idea when I started writing my VP8 encoder. I did not read any literature on the matter ; I just sat down and thought of a brute-force approach. According to the comments in my code :

    // naive, greedy algorithm :
    //   residual = source - predictor
    //   mean = mean(residual)
    //   residual -= mean
    //   find the max diff between the mean and the residual
    // the thinking is that, post-prediction, the best block will
    // be comprised of similar samples
    

    After removing the predictor from the macroblock, individual 4x4 subblocks are put through a forward DCT and quantized. Optimal compression in this scenario results when all samples are the same since only the DC coefficient will be non-zero. Failing that, when the input samples are at least similar to each other, few of the AC coefficients will be non-zero, which helps compression. When the samples are all over the scale, there aren’t a whole lot of non-zero coefficients unless you crank up the quantizer, which results in poor quality in the reconstructed subblocks.

    Thus, my goal was to pick a prediction mode that, when applied to the input block, resulted in a residual in which each element would feature the least deviation from the mean of the residual (relative to other prediction choices).

    Greedy Approach
    I realized that this algorithm falls into the broad general category of "greedy" algorithms— one that makes locally optimal decisions at each stage. There are most likely smarter algorithms. But this one was good enough for making an encoder that just barely works.

    Compression Results
    I checked the total file compression size on my usual 640x360 Big Buck Bunny logo image while forcing prediction modes vs. using my greedy prediction picking algorithm. In this very simple test, DC-only actually resulted in slightly better compression than the greedy algorithm (which says nothing about overall quality).

    prediction mode quantizer index = 0 (minimum) quantizer index = 10
    greedy 286260 98028
    DC 280593 95378
    vertical 297206 105316
    horizontal 295357 104185
    TrueMotion 311660 113480

    As another data point, in both quantizer cases, my greedy algorithm selected a healthy mix of prediction modes :

    • quantizer index 0 : DC = 521, VERT = 151, HORIZ = 183, TM = 65
    • quantizer index 10 : DC = 486, VERT = 167, HORIZ = 190, TM = 77

    Size vs. Quality
    Again, note that this ad-hoc test only measures one property (a highly objective one)— compression size. It did not account for quality which is a far more controversial topic that I have yet to wade into.

  • Basic Video Palette Conversion

    20 août 2011, par Multimedia Mike — General, Python

    How do you take a 24-bit RGB image and convert it to an 8-bit paletted image for the purpose of compression using a codec that requires 8-bit input images ? Seems simple enough and that’s what I’m tackling in this post.

    Ask FFmpeg/Libav To Do It
    Ideally, FFmpeg / Libav should be able to handle this automatically. Indeed, FFmpeg used to be able to, at least at the time I wrote this post about ZMBV and was unhappy with FFmpeg’s default results. Somewhere along the line, FFmpeg and Libav lost the ability to do this. I suspect it got removed during some swscale refactoring.

    Still, there’s no telling if the old system would have computed palettes correctly for QuickTime files.

    Distance Approach
    When I started writing my SMC video encoder, I needed to convert RGB (from PNG files) to PAL8 colorspace. The path of least resistance was to match the pixels in the input image to the default 256-color palette that QuickTime assumes (and is hardcoded into FFmpeg/Libav).

    How to perform the matching ? Find the palette entry that is closest to a given input pixel, where "closest" is the minimum distance as computed by the usual distance formula (square root of the sum of the squares of the diffs of all the components).



    That means for each pixel in an image, check the pixel against 256 palette entries (early termination is possible if an acceptable threshold is met). As you might imagine, this can be a bit time-consuming. I wondered about a faster approach...

    Lookup Table
    I think this is the approach that FFmpeg used to use, but I went and derived it for myself after studying the default QuickTime palette table. There’s a pattern there— all of the RGB entries are comprised of combinations of 6 values — 0x00, 0x33, 0x66, 0x99, 0xCC, and 0xFF. If you mix and match these for red, green, and blue values, you come up with 6 * 6 * 6 = 216 different colors. This happens to be identical to the web-safe color palette.

    The first (0th) entry in the table is (FF, FF, FF), followed by (FF, FF, CC), (FF, FF, 99), and on down to (FF, FF, 00) when the green component gets knocked down and step and the next color is (FF, CC, FF). The first 36 palette entries in the table all have a red component of 0xFF. Thus, if an input RGB pixel has a red color closest to 0xFF, it must map to one of those first 36 entries.

    I created a table which maps indices 0..215 to values from 5..0. Each of the R, G, and B components of an input pixel are used to index into this table and derive 3 indices ri, gi, and bi. Finally, the index into the palette table is given by :

      index = ri * 36 + gi * 6 + bi
    

    For example, the pixel (0xFE, 0xFE, 0x01) would yield ri, gi, and bi values of 0, 0, and 5. Therefore :

      index = 0 * 36 + 0 * 6 + 5
    

    The palette index is 5, which maps to color (0xFF, 0xFF, 0x00).

    Validation
    So I was pretty pleased with myself for coming up with that. Now, ideally, swapping out one algorithm for another in my SMC encoder should yield identical results. That wasn’t the case, initially.

    One problem is that the regulation QuickTime palette actually has 40 more entries above and beyond the typical 216-entry color cube (rounding out the grand total of 256 colors). Thus, using the distance approach with the full default table provides for a little more accuracy.

    However, there still seems to be a problem. Let’s check our old standby, the Big Buck Bunny logo image :



    Distance approach using the full 256-color QuickTime default palette


    Distance approach using the 216-color palette


    Table lookup approach using the 216-color palette

    I can’t quite account for that big red splotch there. That’s the most notable difference between images 1 and 2 and the only visible difference between images 2 and 3.

    To prove to myself that the distance approach is equivalent to the table approach, I wrote a Python script to iterate through all possible RGB combinations and verify the equivalence. If you’re not up on your base 2 math, that’s 224 or 16,777,216 colors to run through. I used Python’s multiprocessing module to great effect and really maximized a Core i7 CPU with 8 hardware threads.

    So I’m confident that the palette conversion techniques are sound. The red spot is probably attributable to a bug in my WIP SMC encoder.

    Source Code
    Update August 23, 2011 : Here’s the Python code I used for proving equivalence between the 2 approaches. In terms of leveraging multiple CPUs, it’s possibly the best program I have written to date.

    PYTHON :
    1. # !/usr/bin/python
    2.  
    3. from multiprocessing import Pool
    4.  
    5. palette = []
    6. pal8_table = []
    7.  
    8. def process_r(r) :
    9.  counts = []
    10.  
    11.  for i in xrange(216) :
    12.   counts.append(0)
    13.  
    14.  print "r = %d" % (r)
    15.  for g in xrange(256) :
    16.   for b in xrange(256) :
    17.    min_dsqrd = 0xFFFFFFFF
    18.    best_index = 0
    19.    for i in xrange(len(palette)) :
    20.     dr = palette[i][0] - r
    21.     dg = palette[i][1] - g
    22.     db = palette[i][2] - b
    23.     dsqrd = dr * dr + dg * dg + db * db
    24.     if dsqrd <min_dsqrd :
    25.      min_dsqrd = dsqrd
    26.      best_index = i
    27.    counts[best_index] += 1
    28.  
    29.    # check if the distance approach deviates from the table-based approach
    30.    i = best_index
    31.    r = palette[i][0]
    32.    g = palette[i][1]
    33.    b = palette[i][2]
    34.    ri = pal8_table[r]
    35.    gi = pal8_table[g]
    36.    bi = pal8_table[b]
    37.    table_index = ri * 36 + gi * 6 + bi ;
    38.    if table_index != best_index :
    39.     print "(0x%02X 0x%02X 0x%02X) : distance index = %d, table index = %d" % (r, g, b, best_index, table_index)
    40.  
    41.  return counts
    42.  
    43. if __name__ == ’__main__’ :
    44.  counts = []
    45.  for i in xrange(216) :
    46.   counts.append(0)
    47.  
    48.  # initialize reference palette
    49.  color_steps = [ 0xFF, 0xCC, 0x99, 0x66, 0x33, 0x00 ]
    50.  for r in color_steps :
    51.   for g in color_steps :
    52.    for b in color_steps :
    53.     palette.append([r, g, b])
    54.  
    55.  # initialize palette conversion table
    56.  for i in range(0, 26) :
    57.   pal8_table.append(5)
    58.  for i in range(26, 77) :
    59.   pal8_table.append(4)
    60.  for i in range(77, 128) :
    61.   pal8_table.append(3)
    62.  for i in range(128, 179) :
    63.   pal8_table.append(2)
    64.  for i in range(179, 230) :
    65.   pal8_table.append(1)
    66.  for i in range(230, 256) :
    67.   pal8_table.append(0)
    68.  
    69.  # create a pool of worker threads and break up the overall job
    70.  pool = Pool()
    71.  it = pool.imap_unordered(process_r, range(256))
    72.  try :
    73.   while 1 :
    74.    partial_counts = it.next()
    75.    for i in xrange(216) :
    76.     counts[i] += partial_counts[i]
    77.  except StopIteration :
    78.   pass
    79.  
    80.  print "index, count, red, green, blue"
    81.  for i in xrange(len(counts)) :
    82.   print "%d, %d, %d, %d, %d" % (i, counts[i], palette[i][0], palette[i][1], palette[i][2])
  • FFMPEG - Drawtext or drawbox or overlay on single frame

    2 novembre 2011, par waxical

    I'm using the avfilters on FFMPEG to drawtext and drawbox. Two of the most poorly documented functions known to man.

    I'm struggling to work out how and if I can use this on a single frame. I.e. appear drawtext on frame 22.

    Current command :-

    ffmpeg -i /home/vtest/test.wmv -y -b 800000 -f flv -vcodec libx264 -vpre default -s 768x432 -g 250 -vf drawtext="fontfile=/home/Cyberbit.ttf:fontsize=24:text=testical:fontcolor=green:x=100:y=200" -qscale 8 -acodec libfaac -sn -vstats  /home/testout.flv

    Two elements mention here in the documentation are n and t - however I only seem to be able to use them in x and y. Not in text or even as other parameters.

    Any help or ffmpeg guidance would be gratefully received.