Recherche avancée

Médias (1)

Mot : - Tags -/belgique

Autres articles (7)

  • Emballe médias : à quoi cela sert ?

    4 février 2011, par

    Ce plugin vise à gérer des sites de mise en ligne de documents de tous types.
    Il crée des "médias", à savoir : un "média" est un article au sens SPIP créé automatiquement lors du téléversement d’un document qu’il soit audio, vidéo, image ou textuel ; un seul document ne peut être lié à un article dit "média" ;

  • Déploiements possibles

    31 janvier 2010, par

    Deux types de déploiements sont envisageable dépendant de deux aspects : La méthode d’installation envisagée (en standalone ou en ferme) ; Le nombre d’encodages journaliers et la fréquentation envisagés ;
    L’encodage de vidéos est un processus lourd consommant énormément de ressources système (CPU et RAM), il est nécessaire de prendre tout cela en considération. Ce système n’est donc possible que sur un ou plusieurs serveurs dédiés.
    Version mono serveur
    La version mono serveur consiste à n’utiliser qu’une (...)

  • Menus personnalisés

    14 novembre 2010, par

    MediaSPIP utilise le plugin Menus pour gérer plusieurs menus configurables pour la navigation.
    Cela permet de laisser aux administrateurs de canaux la possibilité de configurer finement ces menus.
    Menus créés à l’initialisation du site
    Par défaut trois menus sont créés automatiquement à l’initialisation du site : Le menu principal ; Identifiant : barrenav ; Ce menu s’insère en général en haut de la page après le bloc d’entête, son identifiant le rend compatible avec les squelettes basés sur Zpip ; (...)

Sur d’autres sites (3768)

  • Anatomy of an optimization : H.264 deblocking

    26 mai 2010, par Dark Shikari — H.264, assembly, development, speed, x264

    As mentioned in the previous post, H.264 has an adaptive deblocking filter. But what exactly does that mean — and more importantly, what does it mean for performance ? And how can we make it as fast as possible ? In this post I’ll try to answer these questions, particularly in relation to my recent deblocking optimizations in x264.

    H.264′s deblocking filter has two steps : strength calculation and the actual filter. The first step calculates the parameters for the second step. The filter runs on all the edges in each macroblock. That’s 4 vertical edges of length 16 pixels and 4 horizontal edges of length 16 pixels. The vertical edges are filtered first, from left to right, then the horizontal edges, from top to bottom (order matters !). The leftmost edge is the one between the current macroblock and the left macroblock, while the topmost edge is the one between the current macroblock and the top macroblock.

    Here’s the formula for the strength calculation in progressive mode. The highest strength that applies is always selected.

    If we’re on the edge between an intra macroblock and any other macroblock : Strength 4
    If we’re on an internal edge of an intra macroblock : Strength 3
    If either side of a 4-pixel-long edge has residual data : Strength 2
    If the motion vectors on opposite sides of a 4-pixel-long edge are at least a pixel apart (in either x or y direction) or the reference frames aren’t the same : Strength 1
    Otherwise : Strength 0 (no deblocking)

    These values are then thrown into a lookup table depending on the quantizer : higher quantizers have stronger deblocking. Then the actual filter is run with the appropriate parameters. Note that Strength 4 is actually a special deblocking mode that performs a much stronger filter and affects more pixels.

    One can see somewhat intuitively why these strengths are chosen. The deblocker exists to get rid of sharp edges caused by the block-based nature of H.264, and so the strength depends on what exists that might cause such sharp edges. The strength calculation is a way to use existing data from the video stream to make better decisions during the deblocking process, improving compression and quality.

    Both the strength calculation and the actual filter (not described here) are very complex if naively implemented. The latter can be SIMD’d with not too much difficulty ; no H.264 decoder can get away with reasonable performance without such a thing. But what about optimizing the strength calculation ? A quick analysis shows that this can be beneficial as well.

    Since we have to check both horizontal and vertical edges, we have to check up to 32 pairs of coefficient counts (for residual), 16 pairs of reference frame indices, and 128 motion vector values (counting x and y as separate values). This is a lot of calculation ; a naive implementation can take 500-1000 clock cycles on a modern CPU. Of course, there’s a lot of shortcuts we can take. Here’s some examples :

    • If the macroblock uses the 8×8 transform, we only need to check 2 edges in each direction instead of 4, because we don’t deblock inside of the 8×8 blocks.
    • If the macroblock is a P-skip, we only have to check the first edge in each direction, since there’s guaranteed to be no motion vector differences, reference frame differences, or residual inside of the macroblock.
    • If the macroblock has no residual at all, we can skip that check.
    • If we know the partition type of the macroblock, we can do motion vector checks only along the edges of the partitions.
    • If the effective quantizer is so low that no deblocking would be performed no matter what, don’t bother calculating the strength.

    But even all of this doesn’t save us from ourselves. We still have to iterate over a ton of edges, checking each one. Stuff like the partition-checking logic greatly complicates the code and adds overhead even as it reduces the number of checks. And in many cases decoupling the checks to add such logic will make it slower : if the checks are coupled, we can avoid doing a motion vector check if there’s residual, since Strength 2 overrides Strength 1.

    But wait. What if we could do this in SIMD, just like the actual loopfilter itself ? Sure, it seems more of a problem for C code than assembly, but there aren’t any obvious things in the way. Many years ago, Loren Merritt (pengvado) wrote the first SIMD implementation that I know of (for ffmpeg’s decoder) ; it is quite fast, so I decided to work on porting the idea to x264 to see if we could eke out a bit more speed here as well.

    Before I go over what I had to do to make this change, let me first describe how deblocking is implemented in x264. Since the filter is a loopfilter, it acts “in loop” and must be done in both the encoder and decoder — hence why x264 has it too, not just decoders. At the end of encoding one row of macroblocks, x264 goes back and deblocks the row, then performs half-pixel interpolation for use in encoding the next frame.

    We do it per-row for reasons of cache coherency : deblocking accesses a lot of pixels and a lot of code that wouldn’t otherwise be used, so it’s more efficient to do it in a single pass as opposed to deblocking each macroblock immediately after encoding. Then half-pixel interpolation can immediately re-use the resulting data.

    Now to the change. First, I modified deblocking to implement a subset of the macroblock_cache_load function : spend an extra bit of effort loading the necessary data into a data structure which is much simpler to address — as an assembly implementation would need (x264_macroblock_cache_load_deblock). Then I massively cleaned up deblocking to move all of the core strength-calculation logic into a single, small function that could be converted to assembly (deblock_strength_c). Finally, I wrote the assembly functions and worked with Loren to optimize them. Here’s the result.

    And the timings for the resulting assembly function on my Core i7, in cycles :

    deblock_strength_c : 309
    deblock_strength_mmx : 79
    deblock_strength_sse2 : 37
    deblock_strength_ssse3 : 33

    Now that is a seriously nice improvement. 33 cycles on average to perform that many comparisons–that’s absurdly low, especially considering the SIMD takes no branchy shortcuts : it always checks every single edge ! I walked over to my performance chart and happily crossed off a box.

    But I had a hunch that I could do better. Remember, as mentioned earlier, we’re reloading all that data back into our data structures in order to address it. This isn’t that slow, but takes enough time to significantly cut down on the gain of the assembly code. And worse, less than a row ago, all this data was in the correct place to be used (when we just finished encoding the macroblock) ! But if we did the deblocking right after encoding each macroblock, the cache issues would make it too slow to be worth it (yes, I tested this). So I went back to other things, a bit annoyed that I couldn’t get the full benefit of the changes.

    Then, yesterday, I was talking with Pascal, a former Xvid dev and current video hacker over at Google, about various possible x264 optimizations. He had seen my deblocking changes and we discussed that a bit as well. Then two lines hit me like a pile of bricks :

    <_skal_> tried computing the strength at least ?
    <_skal_> while it’s fresh

    Why hadn’t I thought of that ? Do the strength calculation immediately after encoding each macroblock, save the result, and then go pick it up later for the main deblocking filter. Then we can use the data right there and then for strength calculation, but we don’t have to do the whole deblock process until later.

    I went and implemented it and, after working my way through a horde of bugs, eventually got a working implementation. A big catch was that of slices : deblocking normally acts between slices even though normal encoding does not, so I had to perform extra munging to get that to work. By midday today I was able to go cross yet another box off on the performance chart. And now it’s committed.

    Sometimes chatting for 10 minutes with another developer is enough to spot the idea that your brain somehow managed to miss for nearly a straight week.

    NB : the performance chart is on a specific test clip at a specific set of settings (super fast settings) relevant to the company I work at, so it isn’t accurate nor complete for, say, default settings.

    Update : Here’s a higher resolution version of the current chart, as requested in the comments.

  • Multiprocess FATE Revisited

    26 juin 2010, par Multimedia Mike — FATE Server, Python

    I thought I had brainstormed a simple, elegant, multithreaded, deadlock-free refactoring for FATE in a previous post. However, I sort of glossed over the test ordering logic which I had not yet prototyped. The grim, possibly deadlock-afflicted reality is that the main thread needs to be notified as tests are completed. So, the main thread sends test specs through a queue to be executed by n tester threads and those threads send results to a results aggregator thread. Additionally, the results aggregator will need to send completed test IDs back to the main thread.



    But when I step back and look at the graph, I can’t rationalize why there should be a separate results aggregator thread. That was added to cut down on deadlock possibilities since the main thread and the tester threads would not be waiting for data from each other. Now that I’ve come to terms with the fact that the main and the testers need to exchange data in realtime, I think I can safely eliminate the result thread. Adding more threads is not the best way to guard against race conditions and deadlocks. Ask xine.



    I’m still hung up on the deadlock issue. I have these queues through which the threads communicate. At issue is the fact that they can cause a thread to block when inserting an item if the queue is "full". How full is full ? Immaterial ; seeking to answer such a question is not how you guard against race conditions. Rather, it seems to me that one side should be doing non-blocking queue operations.

    This is how I’m planning to revise the logic in the main thread :

    test_set = set of all tests to execute
    tests_pending = test_set
    tests_blocked = empty set
    tests_queue = multi-consumer queue to send test specs to tester threads
    results_queue = multi-producer queue through which tester threads send results
    while there are tests in tests_pending :
      pop a test from test_set
      if test depends on any tests that appear in tests_pending :
        add test to tests_blocked
      else :
        add test to tests_queue in a non-blocking manner
        if tests_queue is full, add test to tests_blocked
    

    while there are results in the results_queue :
    get a result from result_queue in non-blocking manner
    remove the corresponding test from tests_pending

    if tests_blocked is non-empty :
    sleep for 1 second
    test_set = tests_blocked
    tests_blocked = empty set
    else :
    insert n shutdown signals, one from each thread

    go to the top of the loop and repeat until there are no more tests

    while there are results in the results_queue :
    get a result from result_queue in a blocking manner

    Not mentioned in the pseudocode (so it doesn’t get too verbose) is logic to check whether the retrieved test result is actually an end-of-thread signal. These are accounted and the whole test process is done when one is received for each thread.

    On the tester thread side, it’s safe for them to do blocking test queue retrievals and blocking result queue insertions. The reason for the 1-second delay before resetting tests_blocked and looping again is because I want to guard against the situation where tests A and B are to be run, A depends of B running first, and while B is running (and happens to be a long encoding test), the main thread is spinning about, obsessively testing whether it’s time to insert A into the tests queue.

    It all sounds just crazy enough to work. In fact, I coded it up and it does work, sort of. The queue gets blocked pretty quickly. Instead of sleeping, I decided it’s better to perform the put operation using a 1-second timeout.

    Still, I’m paranoid about the precise operation of the IPC queue mechanism at work here. What happens if I try to stuff in a test spec that’s a bit too large ? Will the module take whatever I give it and serialize it through the queue as soon as it can ? I think an impromptu science project is in order.

    big-queue.py :

    PYTHON :
    1. # !/usr/bin/python
    2.  
    3. import multiprocessing
    4. import Queue
    5.  
    6. def f(q) :
    7.   str = q.get()
    8.   print "reader function got a string of %d characters" % (len(str))
    9.  
    10. q = multiprocessing.Queue()
    11. p = multiprocessing.Process(target=f, args=(q,))
    12. p.start()
    13. try :
    14.   q.put_nowait(’a’ * 100000000)
    15. except Queue.Full :
    16.   print "queue full"
    $ ./big-queue.py
    reader function got a string of 100000000 characters
    

    Since 100 MB doesn’t even make it choke, FATE’s little test specs shouldn’t pose any difficulty.

  • Summary Video Accessibility Talk

    23 avril 2013, par silvia

    I’ve just got off a call to the UK Digital TV Group, for which I gave a talk on HTML5 video accessibility (slides best viewed in Google Chrome).

    The slide provide a high-level summary of the accessibility features that we’ve developed in the W3C for HTML5, including :

    • Subtitles & Captions with WebVTT and the track element
    • Video Descriptions with WebVTT, the track element and speech synthesis
    • Chapters with WebVTT for semantic navigation
    • Audio Descriptions through synchronising an audio track with a video
    • Sign Language video synchronized with a main video

    I received some excellent questions.

    The obvious one was about why WebVTT and not TTML. While for anyone who has tried to implement TTML support, the advantages of WebVTT should be clear, for some the decision of the browsers to go with WebVTT still seems to be bothersome. The advantages of CSS over XSL-FO in a browser-context are obvious, but not as much outside browsers. So, the simplicity of WebVTT and the clear integration with HTML have to speak for themselves. Conversion between TTML and WebVTT was a feature that was being asked for.

    I received a question about how to support ducking (reduce the volume of the main audio track) when using video descriptions. My reply was to either use video descriptions with WebVTT and do ducking during the times that a cue is active, or when using audio descriptions (i.e. actual audio tracks) to add an additional WebVTT file of kind=metadata to mark the intervals in which to do ducking. In both cases some JavaScript will be necessary.

    I received another question about how to do clean audio, which I had almost forgotten was a requirement from our earlier media accessibility document. “Clean audio” consists of isolating the audio channel containing the spoken dialog and important non-speech information that can then be amplified or otherwise modified, while other channels containing music or ambient sounds are attenuated. I suggested using the mediagroup attribute to provide a main video element (without an audio track) and then the other channels as parallel audio tracks that can be turned on and off and attenuated individually. There is some JavaScript coding involved on top of the APIs that we have defined in HTML, but it can be implemented in browsers that support the mediagroup attribute.

    Another question was about the possibilities to extend the list of @kind attribute values. I explained that right now we have a proposal for a new text track kind=”forced” so as to provide forced subtitles for sections of video with foreign language. These would be on when no other subtitle or caption tracks are activated. I also explained that if there is a need for application-specific text tracks, the kind=”metadata” would be the correct choice.

    I received some further questions, in particular about how to apply styling to captions (e.g. color changes to text) and about how closely the browser are able to keep synchronization across multiple media elements. The earlier was easily answered with the ::cue pseudo-element, but the latter is a quality of implementation feature, so I had to defer to individual browsers.

    Overall it was a good exercise to summarize the current state of HTML5 video accessibility and I was excited to show off support in Chrome for all the features that we designed into the standard.