Recherche avancée

Médias (3)

Mot : - Tags -/spip

Autres articles (69)

  • La sauvegarde automatique de canaux SPIP

    1er avril 2010, par

    Dans le cadre de la mise en place d’une plateforme ouverte, il est important pour les hébergeurs de pouvoir disposer de sauvegardes assez régulières pour parer à tout problème éventuel.
    Pour réaliser cette tâche on se base sur deux plugins SPIP : Saveauto qui permet une sauvegarde régulière de la base de donnée sous la forme d’un dump mysql (utilisable dans phpmyadmin) mes_fichiers_2 qui permet de réaliser une archive au format zip des données importantes du site (les documents, les éléments (...)

  • Websites made ​​with MediaSPIP

    2 mai 2011, par

    This page lists some websites based on MediaSPIP.

  • Creating farms of unique websites

    13 avril 2011, par

    MediaSPIP platforms can be installed as a farm, with a single "core" hosted on a dedicated server and used by multiple websites.
    This allows (among other things) : implementation costs to be shared between several different projects / individuals rapid deployment of multiple unique sites creation of groups of like-minded sites, making it possible to browse media in a more controlled and selective environment than the major "open" (...)

Sur d’autres sites (5470)

  • Developing A Shader-Based Video Codec

    22 juin 2013, par Multimedia Mike — Outlandish Brainstorms

    Early last month, this thing called ORBX.js was in the news. It ostensibly has something to do with streaming video and codec technology, which naturally catches my interest. The hype was kicked off by Mozilla honcho Brendan Eich when he posted an article asserting that HD video decoding could be entirely performed in JavaScript. We’ve seen this kind of thing before using Broadway– an H.264 decoder implemented entirely in JS. But that exposes some very obvious limitations (notably CPU usage).

    But this new video codec promises 1080p HD playback directly in JavaScript which is a lofty claim. How could it possibly do this ? I got the impression that performance was achieved using WebGL, an extension which allows JavaScript access to accelerated 3D graphics hardware. Browsing through the conversations surrounding the ORBX.js announcement, I found this confirmation from Eich himself :

    You’re right that WebGL does heavy lifting.

    As of this writing, ORBX.js remains some kind of private tech demo. If there were a public demo available, it would necessarily be easy to reverse engineer the downloadable JavaScript decoder.

    But the announcement was enough to make me wonder how it could be possible to create a video codec which effectively leverages 3D hardware.

    Prior Art
    In theorizing about this, it continually occurs to me that I can’t possibly be the first person to attempt to do this (or the ORBX.js people, for that matter). In googling on the matter, I found various forums and Q&A posts where people asked if it were possible to, e.g., accelerate JPEG decoding and presentation using 3D hardware, with no answers. I also found a blog post which describes a plan to use 3D hardware to accelerate VP8 video decoding. It was a project done under the banner of Google’s Summer of Code in 2011, though I’m not sure which open source group mentored the effort. The project did not end up producing the shader-based VP8 codec originally chartered but mentions that “The ‘client side’ of the VP8 VDPAU implementation is working and is currently being reviewed by the libvdpau maintainers.” I’m not sure what that means. Perhaps it includes modifications to the public API that supports VP8, but is waiting for the underlying hardware to actually implement VP8 decoding blocks in hardware.

    What’s So Hard About This ?
    Video decoding is a computationally intensive task. GPUs are known to be really awesome at chewing through computationally intensive tasks. So why aren’t GPUs a natural fit for decoding video codecs ?

    Generally, it boils down to parallelism, or lack of opportunities thereof. GPUs are really good at doing the exact same operations over lots of data at once. The problem is that decoding compressed video usually requires multiple phases that cannot be parallelized, and the individual phases often cannot be parallelized. In strictly mathematical terms, a compressed data stream will need to be decoded by applying a function f(x) over each data element, x0 .. xn. However, the function relies on having applied the function to the previous data element, i.e. :

    f(xn) = f(f(xn-1))
    

    What happens when you try to parallelize such an algorithm ? Temporal rifts in the space/time continuum, if you’re in a Star Trek episode. If you’re in the real world, you’ll get incorrect, unusuable data as the parallel computation is seeded with a bunch of invalid data at multiple points (which is illustrated in some of the pictures in the aforementioned blog post about accelerated VP8).

    Example : JPEG
    Let’s take a very general look at the various stages involved in decoding the ubiquitous JPEG format :


    High level JPEG decoding flow

    What are the opportunities to parallelize these various phases ?

    • Huffman decoding (run length decoding and zig-zag reordering is assumed to be rolled into this phase) : not many opportunities for parallelizing the various Huffman formats out there, including this one. Decoding most Huffman streams is necessarily a sequential operation. I once hypothesized that it would be possible to engineer a codec to achieve some parallelism during the entropy decoding phase, and later found that On2′s VP8 codec employs the scheme. However, such a scheme is unlikely to break down to such a fine level that WebGL would require.
    • Reverse DC prediction : JPEG — and many other codecs — doesn’t store full DC coefficients. It stores differences in successive DC coefficients. Reversing this process can’t be parallelized. See the discussion in the previous section.
    • Dequantize coefficients : This could be very parallelized. It should be noted that software decoders often don’t dequantize all coefficients. Many coefficients are 0 and it’s a waste of a multiplication operation to dequantize. Thus, this phase is sometimes rolled into the Huffman decoding phase.
    • Invert discrete cosine transform : This seems like it could be highly parallelizable. I will be exploring this further in this post.
    • Convert YUV -> RGB for final display : This is a well-established use case for 3D acceleration.

    Crash Course in 3D Shaders and Humility
    So I wanted to see if I could accelerate some parts of JPEG decoding using something called shaders. I made an effort to understand 3D programming and its associated math throughout the 1990s but 3D technology left me behind a very long time ago while I got mixed up in this multimedia stuff. So I plowed through a few books concerning WebGL (thanks to my new Safari Books Online subscription). After I learned enough about WebGL/JS to be dangerous and just enough about shader programming to be absolutely lethal, I set out to try my hand at optimizing IDCT using shaders.

    Here’s my extremely high level (and probably hopelessly naive) view of the modern GPU shader programming model :


    Basic WebGL rendering pipeline

    The WebGL program written in JavaScript drives the show. It sends a set of vertices into the WebGL system and each vertex is processed through a vertex shader. Then, each pixel that falls within a set of vertices is sent through a fragment shader to compute the final pixel attributes (R, G, B, and alpha value). Another consideration is textures : This is data that the program uploads to GPU memory which can be accessed programmatically by the shaders).

    These shaders (vertex and fragment) are key to the GPU’s programmability. How are they programmed ? Using a special C-like shading language. Thought I : “C-like language ? I know C ! I should be able to master this in short order !” So I charged forward with my assumptions and proceeded to get smacked down repeatedly by the overall programming paradigm. I came to recognize this as a variation of the scientific method : Develop a hypothesis– in my case, a mental model of how the system works ; develop an experiment (short program) to prove or disprove the model ; realize something fundamental that I was overlooking ; formulate new hypothesis and repeat.

    First Approach : Vertex Workhorse
    My first pitch goes like this :

    • Upload DCT coefficients to GPU memory in the form of textures
    • Program a vertex mesh that encapsulates 16×16 macroblocks
    • Distribute the IDCT effort among multiple vertex shaders
    • Pass transformed Y, U, and V blocks to fragment shader which will convert the samples to RGB

    So the idea is that decoding of 16×16 macroblocks is parallelized. A macroblock embodies 6 blocks :


    JPEG macroblocks

    It would be nice to process one of these 6 blocks in each vertex. But that means drawing a square with 6 vertices. How do you do that ? I eventually realized that drawing a square with 6 vertices is the recommended method for drawing a square on 3D hardware. Using 2 triangles, each with 3 vertices (0, 1, 2 ; 3, 4, 5) :


    2 triangles make a square

    A vertex shader knows which (x, y) coordinates it has been assigned, so it could figure out which sections of coefficients it needs to access within the textures. But how would a vertex shader know which of the 6 blocks it should process ? Solution : Misappropriate the vertex’s z coordinate. It’s not used for anything else in this case.

    So I set all of that up. Then I hit a new roadblock : How to get the reconstructed Y, U, and V samples transported to the fragment shader ? I have found that communicating between shaders is quite difficult. Texture memory ? WebGL doesn’t allow shaders to write back to texture memory ; shaders can only read it. The standard way to communicate data from a vertex shader to a fragment shader is to declare variables as “varying”. Up until this point, I knew about varying variables but there was something I didn’t quite understand about them and it nagged at me : If 3 different executions of a vertex shader set 3 different values to a varying variable, what value is passed to the fragment shader ?

    It turns out that the varying variable varies, which means that the GPU passes interpolated values to each fragment shader invocation. This completely destroys this idea.

    Second Idea : Vertex Workhorse, Take 2
    The revised pitch is to work around the interpolation issue by just having each vertex shader invocation performs all 6 block transforms. That seems like a lot of redundant. However, I figured out that I can draw a square with only 4 vertices by arranging them in an ‘N’ pattern and asking WebGL to draw a TRIANGLE_STRIP instead of TRIANGLES. Now it’s only doing the 4x the extra work, and not 6x. GPUs are supposed to be great at this type of work, so it shouldn’t matter, right ?

    I wired up an experiment and then ran into a new problem : While I was able to transform a block (or at least pretend to), and load up a varying array (that wouldn’t vary since all vertex shaders wrote the same values) to transmit to the fragment shader, the fragment shader can’t access specific values within the varying block. To clarify, a WebGL shader can use a constant value — or a value that can be evaluated as a constant at compile time — to index into arrays ; a WebGL shader can not compute an index into an array. Per my reading, this is a WebGL security consideration and the limitation may not be present in other OpenGL(-ES) implementations.

    Not Giving Up Yet : Choking The Fragment Shader
    You might want to be sitting down for this pitch :

    • Vertex shader only interpolates texture coordinates to transmit to fragment shader
    • Fragment shader performs IDCT for a single Y sample, U sample, and V sample
    • Fragment shader converts YUV -> RGB

    Seems straightforward enough. However, that step concerning IDCT for Y, U, and V entails a gargantuan number of operations. When computing the IDCT for an entire block of samples, it’s possible to leverage a lot of redundancy in the math which equates to far fewer overall operations. If you absolutely have to compute each sample individually, for an 8×8 block, that requires 64 multiplication/accumulation (MAC) operations per sample. For 3 color planes, and including a few extra multiplications involved in the RGB conversion, that tallies up to about 200 MACs per pixel. Then there’s the fact that this approach means a 4x redundant operations on the color planes.

    It’s crazy, but I just want to see if it can be done. My approach is to pre-compute a pile of IDCT constants in the JavaScript and transmit them to the fragment shader via uniform variables. For a first order optimization, the IDCT constants are formatted as 4-element vectors. This allows computing 16 dot products rather than 64 individual multiplication/addition operations. Ideally, GPU hardware executes the dot products faster (and there is also the possibility of lining these calculations up as matrices).

    I can report that I actually got a sample correctly transformed using this approach. Just one sample, through. Then I ran into some new problems :

    Problem #1 : Computing sample #1 vs. sample #0 requires a different table of 64 IDCT constants. Okay, so create a long table of 64 * 64 IDCT constants. However, this suffers from the same problem as seen in the previous approach : I can’t dynamically compute the index into this array. What’s the alternative ? Maintain 64 separate named arrays and implement 64 branches, when branching of any kind is ill-advised in shader programming to begin with ? I started to go down this path until I ran into…

    Problem #2 : Shaders can only be so large. 64 * 64 floats (4 bytes each) requires 16 kbytes of data and this well exceeds the amount of shader storage that I can assume is allowed. That brings this path of exploration to a screeching halt.

    Further Brainstorming
    I suppose I could forgo pre-computing the constants and directly compute the IDCT for each sample which would entail lots more multiplications as well as 128 cosine calculations per sample (384 considering all 3 color planes). I’m a little stuck with the transform idea right now. Maybe there are some other transforms I could try.

    Another idea would be vector quantization. What little ORBX.js literature is available indicates that there is a method to allow real-time streaming but that it requires GPU assistance to yield enough horsepower to make it feasible. When I think of such severe asymmetry between compression and decompression, my mind drifts towards VQ algorithms. As I come to understand the benefits and limitations of GPU acceleration, I think I can envision a way that something similar to SVQ1, with its copious, hierarchical vector tables stored as textures, could be implemented using shaders.

    So far, this all pertains to intra-coded video frames. What about opportunities for inter-coded frames ? The only approach that I can envision here is to use WebGL’s readPixels() function to fetch the rasterized frame out of the GPU, and then upload it again as a new texture which a new frame processing pipeline could reference. Whether this idea is plausible would require some profiling.

    Using interframes in such a manner seems to imply that the entire codec would need to operate in RGB space and not YUV.

    Conclusions
    The people behind ORBX.js have apparently figured out a way to create a shader-based video codec. I have yet to even begin to reason out a plausible approach. However, I’m glad I did this exercise since I have finally broken through my ignorance regarding modern GPU shader programming. It’s nice to have a topic like multimedia that allows me a jumping-off point to explore other areas.

  • Wrong orientation with FFMpeg Library

    2 octobre 2013, par Sinh Ho

    When I use player (using ffmpeg library) to open video. It open wrong orientation. The video is rotated. Anyone know how to use this library or any sample code/ tutorial, please share me.

    I researched many tutorial, but it only shows command line of ffmpeg library to rotate video.

    What I need here is programming rotate video in C (using NDK and FFmpeg library).

    Thanks !

  • WebVTT as a W3C Recommendation

    2 décembre 2013, par silvia

    Three weeks ago I attended TPAC, the annual meeting of W3C Working Groups. One of the meetings was of the Timed Text Working Group (TT-WG), that has been specifying TTML, the Timed Text Markup Language. It is now proposed that WebVTT be also standardised through the same Working Group.

    How did that happen, you may ask, in particular since WebVTT and TTML have in the past been portrayed as rival caption formats ? How will the WebVTT spec that is currently under development in the Text Track Community Group (TT-CG) move through a Working Group process ?

    I’ll explain first why there is a need for WebVTT to become a W3C Recommendation, and then how this is proposed to be part of the Timed Text Working Group deliverables, and finally how I can see this working between the TT-CG and the TT-WG.

    Advantages of a W3C Recommendation

    TTML is a XML-based markup format for captions developed during the time that XML was all the hotness. It has become a W3C standard (a so-called “Recommendation”) despite not having been implemented in any browsers (if you ask me : that’s actually a flaw of the W3C standardisation process : it requires only two interoperable implementations of any kind – and that could be anyone’s JavaScript library or Flash demonstrator – it doesn’t actually require browser implementations. But I digress…). To be fair, a subpart of TTML is by now implemented in Internet Explorer, but all the other major browsers have thus far rejected proposals of implementation.

    Because of its Recommendation status, TTML has become the basis for several other caption standards that other SDOs have picked : the SMPTE’s SMPTE-TT format, the EBU’s EBU-TT format, and the DASH Industry Forum’s use of SMPTE-TT. SMPTE-TT has also become the “safe harbour” format for the US legislation on captioning as decided by the FCC. (Note that the FCC requirements for captions on the Web are actually based on a list of features rather than requiring a specific format. But that will be the topic of a different blog post…)

    WebVTT is much younger than TTML. TTML was developed as an interchange format among caption authoring systems. WebVTT was built for rendering in Web browsers and with HTML5 in mind. It meets the requirements of the <track> element and supports more than just captions/subtitles. WebVTT is popular with browser developers and has already been implemented in all major browsers (Firefox Nightly is the last to implement it – all others have support already released).

    As we can see and as has been proven by the HTML spec and multiple other specs : browsers don’t wait for specifications to have W3C Recommendation status before they implement them. Nor do they really care about the status of a spec – what they care about is whether a spec makes sense for the Web developer and user communities and whether it fits in the Web platform. WebVTT has obviously achieved this status, even with an evolving spec. (Note that the spec tries very hard not to break backwards compatibility, thus all past implementations will at least be compatible with the more basic features of the spec.)

    Given that Web browsers don’t need WebVTT to become a W3C standard, why then should we spend effort in moving the spec through the W3C process to become a W3C Recommendation ?

    The modern Web is now much bigger than just Web browsers. Web specifications are being used in all kinds of devices including TV set-top boxes, phone and tablet apps, and even unexpected devices such as white goods. Videos are increasingly omnipresent thus exposing deaf and hard-of-hearing users to ever-growing challenges in interacting with content on diverse devices. Some of these devices will not use auto-updating software but fixed versions so can’t easily adapt to new features. Thus, caption producers (both commercial and community) need to be able to author captions (and other video accessibility content as defined by the HTML5
    element) towards a feature set that is clearly defined to be supported by such non-updating devices.

    Understandably, device vendors in this space have a need to build their technology on standardised specifications. SDOs for such device technologies like to reference fixed specifications so the feature set is not continually updating. To reference WebVTT, they could use a snapshot of the specification at any time and reference that, but that’s not how SDOs work. They prefer referencing an officially sanctioned and tested version of a specification – for a W3C specification that means creating a W3C Recommendation of the WebVTT spec.

    Taking WebVTT on a W3C recommendation track is actually advantageous for browsers, too, because a test suite will have to be developed that proves that features are implemented in an interoperable manner. In summary, I can see the advantages and personally support the effort to take WebVTT through to a W3C Recommendation.

    Choice of Working Group

    FAIK this is the first time that a specification developed in a Community Group is being moved into the recommendation track. This is something that has been expected when the W3C created CGs, but not something that has an established process yet.

    The first question of course is which WG would take it through to Recommendation ? Would we create a new Working Group or find an existing one to move the specification through ? Since WGs involve a lot of overhead, the preference was to add WebVTT to the charter of an existing WG. The two obvious candidates were the HTML WG and the TT-WG – the first because it’s where WebVTT originated and the latter because it’s the closest thematically.

    Adding a deliverable to a WG is a major undertaking. The TT-WG is currently in the process of re-chartering and thus a suggestion was made to add WebVTT to the milestones of this WG. TBH that was not my first choice. Since I’m already an editor in the HTML WG and WebVTT is very closely related to HTML and can be tested extensively as part of HTML, I preferred the HTML WG. However, adding WebVTT to the TT-WG has some advantages, too.

    Since TTML is an exchange format, lots of captions that will be created (at least professionally) will be in TTML and TTML-related formats. It makes sense to create a mapping from TTML to WebVTT for rendering in browsers. The expertise of both, TTML and WebVTT experts is required to develop a good mapping – as has been shown when we developed the mapping from CEA608/708 to WebVTT. Also, captioning experts are already in the TT-WG, so it helps to get a second set of eyes onto WebVTT.

    A disadvantage of moving a specification out of a CG into a WG is, however, that you potentially lose a lot of the expertise that is already involved in the development of the spec. People don’t easily re-subscribe to additional mailing lists or want the additional complexity of involving another community (see e.g. this email).

    So, a good process needs to be developed to allow everyone to contribute to the spec in the best way possible without requiring duplicate work. How can we do that ?

    The forthcoming process

    At TPAC the TT-WG discussed for several hours what the next steps are in taking WebVTT through the TT-WG to recommendation status (agenda with slides). I won’t bore you with the different views – if you are keen, you can read the minutes.

    What I came away with is the following process :

    1. Fix a few more bugs in the CG until we’re happy with the feature set in the CG. This should match the feature set that we realistically expect devices to implement for a first version of the WebVTT spec.
    2. Make a FSA (Final Specification Agreement) in the CG to create a stable reference and a clean IPR position.
    3. Assuming that the TT-WG’s charter has been approved with WebVTT as a milestone, we would next bring the FSA specification into the TT-WG as FPWD (First Public Working Draft) and immediately do a Last Call which effectively freezes the feature set (this is possible because there has already been wide community review of the WebVTT spec) ; in parallel, the CG can continue to develop the next version of the WebVTT spec with new features (just like it is happening with the HTML5 and HTML5.1 specifications).
    4. Develop a test suite and address any issues in the Last Call document (of course, also fix these issues in the CG version of the spec).
    5. As per W3C process, substantive and minor changes to Last Call documents have to be reported and raised issues addressed before the spec can progress to the next level : Candidate Recommendation status.
    6. For the next step – Proposed Recommendation status – an implementation report is necessary, and thus the test suite needs to be finalized for the given feature set. The feature set may also be reduced at this stage to just the ones implemented interoperably, leaving any other features for the next version of the spec.
    7. The final step is Recommendation status, which simply requires sufficient support and endorsement by W3C members.

    The first version of the WebVTT spec naturally has a focus on captioning (and subtitling), since this has been the dominant use case that we have focused on this far and it’s the part that is the most compatibly implemented feature set of WebVTT in browsers. It’s my expectation that the next version of WebVTT will have a lot more features related to audio descriptions, chapters and metadata. Thus, this seems a good time for a first version feature freeze.

    There are still several obstacles towards progressing WebVTT as a milestone of the TT-WG. Apart from the need to get buy-in from the TT-WG, the TT-CG, and the AC (Adivisory Committee who have to approve the new charter), we’re also looking at the license of the specification document.

    The CG specification has an open license that allows creating derivative work as long as there is attribution, while the W3C document license for documents on the recommendation track does not allow the creation of derivative work unless given explicit exceptions. This is an issue that is currently being discussed in the W3C with a proposal for a CC-BY license on the Recommendation track. However, my view is that it’s probably ok to use the different document licenses : the TT-WG will work on WebVTT 1.0 and give it a W3C document license, while the CG starts working on the next WebVTT version under the open CG license. It probably actually makes sense to have a less open license on a frozen spec.

    Making the best of a complicated world

    WebVTT is now proposed as part of the recharter of the TT-WG. I have no idea how complicated the process will become to achieve a W3C WebVTT 1.0 Recommendation, but I am hoping that what is outlined above will be workable in such a way that all of us get to focus on progressing the technology.

    At TPAC I got the impression that the TT-WG is committed to progressing WebVTT to Recommendation status. I know that the TT-CG is committed to continue developing WebVTT to its full potential for all kinds of media-time aligned content with new kinds already discussed at FOMS. Let’s enable both groups to achieve their goals. As a consequence, we will allow the two formats to excel where they do : TTML as an interchange format and WebVTT as a browser rendering format.