Recherche avancée

Médias (91)

Autres articles (43)

  • MediaSPIP : Modification des droits de création d’objets et de publication définitive

    11 novembre 2010, par

    Par défaut, MediaSPIP permet de créer 5 types d’objets.
    Toujours par défaut les droits de création et de publication définitive de ces objets sont réservés aux administrateurs, mais ils sont bien entendu configurables par les webmestres.
    Ces droits sont ainsi bloqués pour plusieurs raisons : parce que le fait d’autoriser à publier doit être la volonté du webmestre pas de l’ensemble de la plateforme et donc ne pas être un choix par défaut ; parce qu’avoir un compte peut servir à autre choses également, (...)

  • Personnaliser les catégories

    21 juin 2013, par

    Formulaire de création d’une catégorie
    Pour ceux qui connaissent bien SPIP, une catégorie peut être assimilée à une rubrique.
    Dans le cas d’un document de type catégorie, les champs proposés par défaut sont : Texte
    On peut modifier ce formulaire dans la partie :
    Administration > Configuration des masques de formulaire.
    Dans le cas d’un document de type média, les champs non affichés par défaut sont : Descriptif rapide
    Par ailleurs, c’est dans cette partie configuration qu’on peut indiquer le (...)

  • HTML5 audio and video support

    13 avril 2011, par

    MediaSPIP uses HTML5 video and audio tags to play multimedia files, taking advantage of the latest W3C innovations supported by modern browsers.
    The MediaSPIP player used has been created specifically for MediaSPIP and can be easily adapted to fit in with a specific theme.
    For older browsers the Flowplayer flash fallback is used.
    MediaSPIP allows for media playback on major mobile platforms with the above (...)

Sur d’autres sites (7939)

  • Basic Video Palette Conversion

    20 août 2011, par Multimedia Mike — General, Python

    How do you take a 24-bit RGB image and convert it to an 8-bit paletted image for the purpose of compression using a codec that requires 8-bit input images ? Seems simple enough and that’s what I’m tackling in this post.

    Ask FFmpeg/Libav To Do It
    Ideally, FFmpeg / Libav should be able to handle this automatically. Indeed, FFmpeg used to be able to, at least at the time I wrote this post about ZMBV and was unhappy with FFmpeg’s default results. Somewhere along the line, FFmpeg and Libav lost the ability to do this. I suspect it got removed during some swscale refactoring.

    Still, there’s no telling if the old system would have computed palettes correctly for QuickTime files.

    Distance Approach
    When I started writing my SMC video encoder, I needed to convert RGB (from PNG files) to PAL8 colorspace. The path of least resistance was to match the pixels in the input image to the default 256-color palette that QuickTime assumes (and is hardcoded into FFmpeg/Libav).

    How to perform the matching ? Find the palette entry that is closest to a given input pixel, where "closest" is the minimum distance as computed by the usual distance formula (square root of the sum of the squares of the diffs of all the components).



    That means for each pixel in an image, check the pixel against 256 palette entries (early termination is possible if an acceptable threshold is met). As you might imagine, this can be a bit time-consuming. I wondered about a faster approach...

    Lookup Table
    I think this is the approach that FFmpeg used to use, but I went and derived it for myself after studying the default QuickTime palette table. There’s a pattern there— all of the RGB entries are comprised of combinations of 6 values — 0x00, 0x33, 0x66, 0x99, 0xCC, and 0xFF. If you mix and match these for red, green, and blue values, you come up with 6 * 6 * 6 = 216 different colors. This happens to be identical to the web-safe color palette.

    The first (0th) entry in the table is (FF, FF, FF), followed by (FF, FF, CC), (FF, FF, 99), and on down to (FF, FF, 00) when the green component gets knocked down and step and the next color is (FF, CC, FF). The first 36 palette entries in the table all have a red component of 0xFF. Thus, if an input RGB pixel has a red color closest to 0xFF, it must map to one of those first 36 entries.

    I created a table which maps indices 0..215 to values from 5..0. Each of the R, G, and B components of an input pixel are used to index into this table and derive 3 indices ri, gi, and bi. Finally, the index into the palette table is given by :

      index = ri * 36 + gi * 6 + bi
    

    For example, the pixel (0xFE, 0xFE, 0x01) would yield ri, gi, and bi values of 0, 0, and 5. Therefore :

      index = 0 * 36 + 0 * 6 + 5
    

    The palette index is 5, which maps to color (0xFF, 0xFF, 0x00).

    Validation
    So I was pretty pleased with myself for coming up with that. Now, ideally, swapping out one algorithm for another in my SMC encoder should yield identical results. That wasn’t the case, initially.

    One problem is that the regulation QuickTime palette actually has 40 more entries above and beyond the typical 216-entry color cube (rounding out the grand total of 256 colors). Thus, using the distance approach with the full default table provides for a little more accuracy.

    However, there still seems to be a problem. Let’s check our old standby, the Big Buck Bunny logo image :



    Distance approach using the full 256-color QuickTime default palette


    Distance approach using the 216-color palette


    Table lookup approach using the 216-color palette

    I can’t quite account for that big red splotch there. That’s the most notable difference between images 1 and 2 and the only visible difference between images 2 and 3.

    To prove to myself that the distance approach is equivalent to the table approach, I wrote a Python script to iterate through all possible RGB combinations and verify the equivalence. If you’re not up on your base 2 math, that’s 224 or 16,777,216 colors to run through. I used Python’s multiprocessing module to great effect and really maximized a Core i7 CPU with 8 hardware threads.

    So I’m confident that the palette conversion techniques are sound. The red spot is probably attributable to a bug in my WIP SMC encoder.

    Source Code
    Update August 23, 2011 : Here’s the Python code I used for proving equivalence between the 2 approaches. In terms of leveraging multiple CPUs, it’s possibly the best program I have written to date.

    PYTHON :
    1. # !/usr/bin/python
    2.  
    3. from multiprocessing import Pool
    4.  
    5. palette = []
    6. pal8_table = []
    7.  
    8. def process_r(r) :
    9.  counts = []
    10.  
    11.  for i in xrange(216) :
    12.   counts.append(0)
    13.  
    14.  print "r = %d" % (r)
    15.  for g in xrange(256) :
    16.   for b in xrange(256) :
    17.    min_dsqrd = 0xFFFFFFFF
    18.    best_index = 0
    19.    for i in xrange(len(palette)) :
    20.     dr = palette[i][0] - r
    21.     dg = palette[i][1] - g
    22.     db = palette[i][2] - b
    23.     dsqrd = dr * dr + dg * dg + db * db
    24.     if dsqrd <min_dsqrd :
    25.      min_dsqrd = dsqrd
    26.      best_index = i
    27.    counts[best_index] += 1
    28.  
    29.    # check if the distance approach deviates from the table-based approach
    30.    i = best_index
    31.    r = palette[i][0]
    32.    g = palette[i][1]
    33.    b = palette[i][2]
    34.    ri = pal8_table[r]
    35.    gi = pal8_table[g]
    36.    bi = pal8_table[b]
    37.    table_index = ri * 36 + gi * 6 + bi ;
    38.    if table_index != best_index :
    39.     print "(0x%02X 0x%02X 0x%02X) : distance index = %d, table index = %d" % (r, g, b, best_index, table_index)
    40.  
    41.  return counts
    42.  
    43. if __name__ == ’__main__’ :
    44.  counts = []
    45.  for i in xrange(216) :
    46.   counts.append(0)
    47.  
    48.  # initialize reference palette
    49.  color_steps = [ 0xFF, 0xCC, 0x99, 0x66, 0x33, 0x00 ]
    50.  for r in color_steps :
    51.   for g in color_steps :
    52.    for b in color_steps :
    53.     palette.append([r, g, b])
    54.  
    55.  # initialize palette conversion table
    56.  for i in range(0, 26) :
    57.   pal8_table.append(5)
    58.  for i in range(26, 77) :
    59.   pal8_table.append(4)
    60.  for i in range(77, 128) :
    61.   pal8_table.append(3)
    62.  for i in range(128, 179) :
    63.   pal8_table.append(2)
    64.  for i in range(179, 230) :
    65.   pal8_table.append(1)
    66.  for i in range(230, 256) :
    67.   pal8_table.append(0)
    68.  
    69.  # create a pool of worker threads and break up the overall job
    70.  pool = Pool()
    71.  it = pool.imap_unordered(process_r, range(256))
    72.  try :
    73.   while 1 :
    74.    partial_counts = it.next()
    75.    for i in xrange(216) :
    76.     counts[i] += partial_counts[i]
    77.  except StopIteration :
    78.   pass
    79.  
    80.  print "index, count, red, green, blue"
    81.  for i in xrange(len(counts)) :
    82.   print "%d, %d, %d, %d, %d" % (i, counts[i], palette[i][0], palette[i][1], palette[i][2])
  • H.264 and VP8 for still image coding : WebP ?

    http://x264.nl/developers/Dark_Shikari/imagecoding/output.ogv
    1er octobre 2010, par Dark Shikari — google, H.264, psychovisual optimizations, VP8

    Update : post now contains a Theora comparison as well ; see below.

    JPEG is a very old lossy image format. By today’s standards, it’s awful compression-wise : practically every video format since the days of MPEG-2 has been able to tie or beat JPEG at its own game. The reasons people haven’t switched to something more modern practically always boil down to a simple one — it’s just not worth the hassle. Even if JPEG can be beaten by a factor of 2, convincing the entire world to change image formats after 20 years is nigh impossible. Furthermore, JPEG is fast, simple, and practically guaranteed to be free of any intellectual property worries. It’s been tried before : JPEG-2000 first, then Microsoft’s JPEG XR, both tried to unseat JPEG. Neither got much of anywhere.

    Now Google is trying to dump yet another image format on us, “WebP”. But really, it’s just a VP8 intra frame. There are some obvious practical problems with this new image format in comparison to JPEG ; it doesn’t even support all of JPEG’s features, let alone many of the much-wanted features JPEG was missing (alpha channel support, lossless support). It only supports 4:2:0 chroma subsampling, while JPEG can handle 4:2:2 and 4:4:4. Google doesn’t seem interested in adding any of these features either.

    But let’s get to the meat and see how these encoders stack up on compressing still images. As I explained in my original analysis, VP8 has the advantage of H.264′s intra prediction, which is one of the primary reasons why H.264 has such an advantage in intra compression. It only has i4x4 and i16x16 modes, not i8x8, so it’s not quite as fancy as H.264′s, but it comes close.

    The test files are all around 155KB ; download them for the exact filesizes. For all three, I did a binary search of quality levels to get the file sizes close. For x264, I encoded with --tune stillimage --preset placebo. For libvpx, I encoded with --best. For JPEG, I encoded with ffmpeg, then applied jpgcrush, a lossless jpeg compressor. I suspect there are better JPEG encoders out there than ffmpeg ; if you have one, feel free to test it and post the results. The source image is the 200th frame of Parkjoy, from derf’s page (fun fact : this video was shot here ! More info on the video here.).

    Files : (x264 [154KB], vp8 [155KB], jpg [156KB])

    Results (decoded to PNG) : (x264, vp8, jpg)

    This seems rather embarrassing for libvpx. Personally I think VP8 looks by far the worst of the bunch, despite JPEG’s blocking. What’s going on here ? VP8 certainly has better entropy coding than JPEG does (by far !). It has better intra prediction (JPEG has just DC prediction). How could VP8 look worse ? Let’s investigate.

    VP8 uses a 4×4 transform, which tends to blur and lose more detail than JPEG’s 8×8 transform. But that alone certainly isn’t enough to create such a dramatic difference. Let’s investigate a hypothesis — that the problem is that libvpx is optimizing for PSNR and ignoring psychovisual considerations when encoding the image… I’ll encode with --tune psnr --preset placebo in x264, turning off all psy optimizations. 

    Files : (x264, optimized for PSNR [154KB]) [Note for the technical people : because adaptive quantization is off, to get the filesize on target I had to use a CQM here.]

    Results (decoded to PNG) : (x264, optimized for PSNR)

    What a blur ! Only somewhat better than VP8, and still worse than JPEG. And that’s using the same encoder and the same level of analysis — the only thing done differently is dropping the psy optimizations. Thus we come back to the conclusion I’ve made over and over on this blog — the encoder matters more than the video format, and good psy optimizations are more important than anything else for compression. libvpx, a much more powerful encoder than ffmpeg’s jpeg encoder, loses because it tries too hard to optimize for PSNR.

    These results raise an obvious question — is Google nuts ? I could understand the push for “WebP” if it was better than JPEG. And sure, technically as a file format it is, and an encoder could be made for it that’s better than JPEG. But note the word “could”. Why announce it now when libvpx is still such an awful encoder ? You’d have to be nuts to try to replace JPEG with this blurry mess as-is. Now, I don’t expect libvpx to be able to compete with x264, the best encoder in the world — but surely it should be able to beat an image format released in 1992 ?

    Earth to Google : make the encoder good first, then promote it as better than the alternatives. The reverse doesn’t work quite as well.

    Addendum (added Oct. 2, 03:51) :

    maikmerten gave me a Theora-encoded image to compare as well. Here’s the PNG and the source (155KB). And yes, that’s Theora 1.2 (Ptalarbvorm) beating VP8 handily. Now that is embarassing. Guess what the main new feature of Ptalarbvorm is ? Psy optimizations…

    Addendum (added Apr. 20, 23:33) :

    There’s a new webp encoder out, written from scratch by skal (available in libwebp). It’s significantly better than libvpx — not like that says much — but it should probably beat JPEG much more readily now. The encoder design is rather unique — it basically uses K-means for a large part of the encoding process. It still loses to x264, but that was expected.

    [155KB]
  • Greed is Good ; Greed Works

    25 novembre 2010, par Multimedia Mike — VP8

    Greed, for lack of a better word, is good ; Greed works. Well, most of the time. Maybe.

    Picking Prediction Modes
    VP8 uses one of 4 prediction modes to predict a 16x16 luma block or 8x8 chroma block before processing it (for luma, a block can also be broken into 16 4x4 blocks for individual prediction using even more modes).

    So, how to pick the best predictor mode ? I had no idea when I started writing my VP8 encoder. I did not read any literature on the matter ; I just sat down and thought of a brute-force approach. According to the comments in my code :

    // naive, greedy algorithm :
    //   residual = source - predictor
    //   mean = mean(residual)
    //   residual -= mean
    //   find the max diff between the mean and the residual
    // the thinking is that, post-prediction, the best block will
    // be comprised of similar samples
    

    After removing the predictor from the macroblock, individual 4x4 subblocks are put through a forward DCT and quantized. Optimal compression in this scenario results when all samples are the same since only the DC coefficient will be non-zero. Failing that, when the input samples are at least similar to each other, few of the AC coefficients will be non-zero, which helps compression. When the samples are all over the scale, there aren’t a whole lot of non-zero coefficients unless you crank up the quantizer, which results in poor quality in the reconstructed subblocks.

    Thus, my goal was to pick a prediction mode that, when applied to the input block, resulted in a residual in which each element would feature the least deviation from the mean of the residual (relative to other prediction choices).

    Greedy Approach
    I realized that this algorithm falls into the broad general category of "greedy" algorithms— one that makes locally optimal decisions at each stage. There are most likely smarter algorithms. But this one was good enough for making an encoder that just barely works.

    Compression Results
    I checked the total file compression size on my usual 640x360 Big Buck Bunny logo image while forcing prediction modes vs. using my greedy prediction picking algorithm. In this very simple test, DC-only actually resulted in slightly better compression than the greedy algorithm (which says nothing about overall quality).

    prediction mode quantizer index = 0 (minimum) quantizer index = 10
    greedy 286260 98028
    DC 280593 95378
    vertical 297206 105316
    horizontal 295357 104185
    TrueMotion 311660 113480

    As another data point, in both quantizer cases, my greedy algorithm selected a healthy mix of prediction modes :

    • quantizer index 0 : DC = 521, VERT = 151, HORIZ = 183, TM = 65
    • quantizer index 10 : DC = 486, VERT = 167, HORIZ = 190, TM = 77

    Size vs. Quality
    Again, note that this ad-hoc test only measures one property (a highly objective one)— compression size. It did not account for quality which is a far more controversial topic that I have yet to wade into.