Recherche avancée

Médias (91)

Autres articles (66)

  • Participer à sa traduction

    10 avril 2011

    Vous pouvez nous aider à améliorer les locutions utilisées dans le logiciel ou à traduire celui-ci dans n’importe qu’elle nouvelle langue permettant sa diffusion à de nouvelles communautés linguistiques.
    Pour ce faire, on utilise l’interface de traduction de SPIP où l’ensemble des modules de langue de MediaSPIP sont à disposition. ll vous suffit de vous inscrire sur la liste de discussion des traducteurs pour demander plus d’informations.
    Actuellement MediaSPIP n’est disponible qu’en français et (...)

  • Websites made ​​with MediaSPIP

    2 mai 2011, par

    This page lists some websites based on MediaSPIP.

  • Possibilité de déploiement en ferme

    12 avril 2011, par

    MediaSPIP peut être installé comme une ferme, avec un seul "noyau" hébergé sur un serveur dédié et utilisé par une multitude de sites différents.
    Cela permet, par exemple : de pouvoir partager les frais de mise en œuvre entre plusieurs projets / individus ; de pouvoir déployer rapidement une multitude de sites uniques ; d’éviter d’avoir à mettre l’ensemble des créations dans un fourre-tout numérique comme c’est le cas pour les grandes plate-formes tout public disséminées sur le (...)

Sur d’autres sites (9798)

  • Long Overdue MediaWiki Upgrade

    5 février 2014, par Multimedia Mike — General

    What do I do ? What I do ? This library book is 42 years overdue !
    I admit that it’s mine, yet I can’t pay the fine,
    Should I turn it in or should I hide it again ?
    What do I do ? What do I do ?

    I internalized the forgoing paean to the perils of procrastination by Shel Silverstein in my formative years. It’s probably why I’ve never paid a single cent in late fees in my entire life.

    However, I have been woefully negligent as the steward of the MediaWiki software that drives the world famous MultimediaWiki, the internet’s central repository of obscure technical knowledge related to multimedia. It is currently running of version 1.6 software. The latest version is 1.22.

    The Story So Far
    According to my records, I first set up the wiki late in 2005. I don’t know which MediaWiki release I was using at the time. I probably conducted a few upgrades in the early days, but that went by the wayside perhaps in 2007. My web host stopped allowing shell access and the MediaWiki upgrade process pretty much requires running a PHP script from a command line. Upgrade time came around and I put off the project. Weeks turned into months turned into years until, according to some notes, the wiki abruptly stopped working in July, 2011. Suddenly, there were PHP errors about “Namespace” being a reserved word.

    While I finally laid out a plan to upgrade the wiki after all these years, I eventually found that the problem had been caused when my webhost upgraded from PHP 5.2 -> 5.3. I also learned of a small number of code changes that caused the problem to go away, thus kicking the can down the road once more.

    Then a new problem showed up last week. I think it might be related to a new version of PHP again. This time, a few other things on my site broke, and I learned that my webhost now allows me to select a PHP version to use (with the version then set to “auto”, which didn’t yield much information). Rolling back to an earlier version of PHP might have solved the problem easily.

    But NO ! I made the determination that this goes no further. I want this wiki upgraded.

    The Arduous Upgrade Path
    There are 2 general upgrade paths I can think of :

    1. Upgrade in place on the server
    2. Upgrade offline and put the site back on the server

    Approach #1 is problematic since I don’t have direct shell access, though I considered using something like PHP Shell. Approach #2 involves getting the entire set of wiki files and a backup of the MySQL tables. This is workable since I keep automated backups of these items anyway.

    In fairly short order, I was able to set up a working copy of the MultimediaWiki hosted on a local Linux machine. Now what’s the move ? The MediaWiki software I’m running is 1.6.10. The very latest, as of this upgrade project is 1.22.2. I suppose it’s way too much to hope that the software will upgrade cleanly from 1.6.x straight to 1.22.x, but I guess it’s worth a shot…

    HA ! No chance. Okay, next idea is to march through the various versions and upgrade each in turn. MediaWiki has all their historic releases online, all the way back to the 1.3 lineage. I decided that the latest of each lineage should upgrade cleanly from anything in the previous version of lineage. E.g., 1.6.10 should upgrade cleanly to 1.7.3 (last in the 1.7 series). This seemed to be a workable strategy. So I downloaded the latest of each series, unpacked, and copied all the wiki files over the working installation and ran ‘php update.php’ in the maintenance/ directory.

    The process is tedious and not without its obstacles. I consider this penance for my years of wiki neglect. First, I run into the “PHP Parse error : syntax error, unexpected T_NAMESPACE, expecting T_STRING” issue, the same that I saw years ago after the webhost transitioned from PHP 5.2 -> 5.3. I could solve this by editing assorted files and changing “Namespace” -> “MWNamespace” (which is what MediaWiki did by version 1.13). But I would prefer not to.

    Instead, I downloaded the source for PHP 5.2 and compiled it in a separate directory, then called ‘/path/to/php/5.2/bin/php update.php’. Problem solved.

    The next problem is that a bunch of the database update scripts are specifying “Type=InnoDB”. This isn’t supported by modern MySQL databases. Now, it’s “Engine=InnoDB”. A quick search & replace at the command line fixes this for 1.6.x… and 1.7.x… and 1.8 through 1.12. Finally, at 1.13, it was no longer necessary. As a bonus, at 1.13, I was able to test the installation since Namespace had been renamed to MWNamespace. I would later learn that the table type modifications probably could have been simplified in by changing “$wgDBmysql4 = true ;” to “$wgDBmysql5 = true ;” somewhere in LocalSettings.php.

    Command line upgrading worked smoothly up through 1.18 series when I got a new syntax error :

    <br />
    PHP Fatal error:  Call to a member function addMessages() on a non-object in /mnt/sdb1/archive/wiki/extensions/Cite.php on line 68<br />

    Best I could do was comment out that line. I hope that doesn’t break anything important.

    In the home stretch, the very last transition (1.21 -> 1.22) failed :

    PHP Fatal error :  Cannot redeclare wfProfileIn() (previously declared in 
    /mnt/sdb1/archive/wiki/includes/profiler/Profiler.php:33) in 
    /mnt/sdb1/archive/wiki/includes/ProfilerStub.php on line 25
    

    Apparently, this problem arises occasionally since 1.18. I found a way around it thanks to this page : Deleted the file StartProfiler.php. Who am I to argue ?

    Upon completing the transition to 1.22, the wiki doesn’t look correct– the pictures aren’t showing up. The solution was to fix the temporary directory via LocalSettings.php.

    Back To Production
    Okay, it all works again ! Locally, that is. How to get it back to the server ? My first idea was that, knowing that this upgrade process can succeed, try stepping through the upgrade process again, but tell the update.php scripts to access the database tables on multimedia.cx. This seemed to be working for awhile, even though the database update phase often took 4-5 minutes. However, the transition from 1.8.5 -> 1.9.6 took 75 minutes and then timed out. According to my notes, “This isn’t going to work.”

    The new process :

    1. Dump the database tables from the local database.
    2. Create a new database remotely (melanson_wiki_ng).
    3. Dump the database table into melanson_wiki_ng.
    4. Move the index.php file out of the wiki files directory temporarily (or rename).
    5. Modify the LocalSettings.php to talk to the new database.
    6. Perform a lftp mirror operation in order to send all the files up to the server.
    7. Send the index.php file and hope beyond hope that everything magically works.

    And that’s the story of how the updated MultimediaWiki came back online. Despite the database dump file being over 110 MB, it only tool MySQL 1m45s to transmit it all to the remote server (let’s hear it for the ‘–compress’ option). For comparison, inserting the tables back into a fresh local database took 1m07s.

    When the MultimediaWiki was first live again, it loaded, but ever so slowly. This is when I finally looked into optimization and found that I was lacking any caching. So as a bonus, the MultimediaWiki should be much faster now.

    Going Forward
    For all I know, I did everything described here in the hardest way possible. But at least I got it done. Unless I learn of a better process, future upgrades will probably look similar to this.

    Additionally, I should probably take some time to figure out what new features are part of the standard MediaWiki distribution nowadays.

  • Working on images asynchronously

    15 décembre 2013, par Mikko Koppanen — Imagick, PHP stuff

    To get my quota on buzzwords for the day we are going to look at using ZeroMQ and Imagick to create a simple asynchronous image processing system. Why asynchronous ? First of all, separating the image handling from a interactive PHP scripts allows us to scale the image processing separately from the web heads. For example we could do the image processing on separate servers, which have SSDs attached and more memory. In this example making the images available to all worker nodes is left to the reader.

    Secondly, separating the image processing from a web script can provide more responsive experience to the user. This doesn’t necessarily mean faster, but let’s say in a multiple image upload scenario this method allows the user to do something else on the site while we process the images in the background. This can be beneficial especially in cases where users upload hundreds of images at a time. To achieve a simple distributed image processing infrastructure we are going to use ZeroMQ for communicating between different components and Imagick to work on the images.

    The first part we are going to create is a simple “Worker” -process skeleton. Naturally for a live environment you would like to have more error handling and possibly use pcntl for process control, but for the sake of brewity the example is barebones :

    1. < ?php
    2.  
    3. define (’THUMBNAIL_ADDR’, ’tcp ://127.0.0.1:5000’) ;
    4. define (’COLLECTOR_ADDR’, ’tcp ://127.0.0.1:5001’) ;
    5.  
    6. class Worker {
    7.  
    8.   private $in ;
    9.   private $out ;
    10.  
    11.   public function __construct ($in_addr, $out_addr)
    12.   {
    13.     $context = new ZMQContext () ;
    14.  
    15.     $this->in = new ZMQSocket ($context, ZMQ: :SOCKET_PULL) ;
    16.     $this->in->bind ($in_addr) ;
    17.  
    18.     $this->out = new ZMQSocket ($context, ZMQ: :SOCKET_PUSH) ;
    19.     $this->out->connect ($out_addr) ;
    20.   }
    21.  
    22.   public function work () {
    23.     while ($command = $this->in->recvMulti ()) {
    24.       if (isset ($this->commands [$command [0]])) {
    25.         echo "Received work" . PHP_EOL ;
    26.  
    27.         $callback = $this->commands [$command [0]] ;
    28.  
    29.         array_shift ($command) ;
    30.         $response = call_user_func_array ($callback, $command) ;
    31.  
    32.         if (is_array ($response))
    33.           $this->out->sendMulti ($response) ;
    34.         else
    35.           $this->out->send ($response) ;
    36.       }
    37.       else {
    38.         error_log ("There is no registered worker for $command [0]") ;
    39.       }
    40.     }
    41.   }
    42.  
    43.   public function register ($command, $callback)
    44.   {
    45.     $this->commands [$command] = $callback ;
    46.   }
    47. }
    48.  ?>

    The Worker class allows us to register commands with callbacks associated with them. In our case the Worker class doesn’t actually care or know about the parameters being passed to the actual callback, it just blindly passes them on. We are using two separate sockets in this example, one for incoming work requests and one for passing the results onwards. This allows us to create a simple pipeline by adding more workers in the mix. For example we could first have a watermark worker, which takes the original image and composites a watermark on it, passes the file onwards to thumbnail worker, which then creates different sizes of thumbnails and passes the final results to event collector.

    The next part we are going to create a is a simple worker script that does the actual thumbnailing of the images :

    1. < ?php
    2. include __DIR__ . ’/common.php’ ;
    3.  
    4. // Create worker class and bind the inbound address to ’THUMBNAIL_ADDR’ and connect outbound to ’COLLECTOR_ADDR’
    5. $worker = new Worker (THUMBNAIL_ADDR, COLLECTOR_ADDR) ;
    6.  
    7. // Register our thumbnail callback, nothing special here
    8. $worker->register (’thumbnail’, function ($filename, $width, $height) {
    9.                   $info = pathinfo ($filename) ;
    10.  
    11.                   $out = sprintf ("%s/%s_%dx%d.%s",
    12.                           $info [’dirname’],
    13.                           $info [’filename’],
    14.                           $width,
    15.                           $height,
    16.                           $info [’extension’]) ;
    17.  
    18.                   $status = 1 ;
    19.                   $message = ’’ ;
    20.  
    21.                   try {
    22.                     $im = new Imagick ($filename) ;
    23.                     $im->thumbnailImage ($width, $height) ;
    24.                     $im->writeImage ($out) ;
    25.                   }
    26.                   catch (Exception $e) {
    27.                     $status = 0 ;
    28.                     $message = $e->getMessage () ;
    29.                   }
    30.  
    31.                   return array (
    32.                         ’status’  => $status,
    33.                         ’filename’ => $filename,
    34.                         ’thumbnail’ => $out,
    35.                         ’message’ => $message,
    36.                     ) ;
    37.                 }) ;
    38.  
    39. // Run the worker, will block
    40. echo "Running thumbnail worker.." . PHP_EOL ;
    41. $worker->work () ;

    As you can see from the code the thumbnail worker registers a callback for ‘thumbnail’ command. The callback does the thumbnailing based on input and returns the status, original filename and the thumbnail filename. We have connected our Workers “outbound” socket to event collector, which will receive the results from the thumbnail worker and do something with them. What the “something” is depends on you. For example you could push the response into a websocket to show immediate feeedback to the user or store the results into a database.

    Our example event collector will just do a var_dump on every event it receives from the thumbnailer :

    1. < ?php
    2. include __DIR__ . ’/common.php’ ;
    3.  
    4. $socket = new ZMQSocket (new ZMQContext (), ZMQ: :SOCKET_PULL) ;
    5. $socket->bind (COLLECTOR_ADDR) ;
    6.  
    7. echo "Waiting for events.." . PHP_EOL ;
    8. while (($message = $socket->recvMulti ())) {
    9.   var_dump ($message) ;
    10. }
    11.  ?>

    The final piece of the puzzle is the client that pumps messages into the pipeline. The client connects to the thumbnail worker, passes on filename and desired dimensions :

    1. < ?php
    2. include __DIR__ . ’/common.php’ ;
    3.  
    4. $socket = new ZMQSocket (new ZMQContext (), ZMQ: :SOCKET_PUSH) ;
    5. $socket->connect (THUMBNAIL_ADDR) ;
    6.  
    7. $socket->sendMulti (
    8.       array (
    9.         ’thumbnail’,
    10.         realpath (’./test.jpg’),
    11.         50,
    12.         50,
    13.       )
    14. ) ;
    15. echo "Sent request" . PHP_EOL ;
    16.  ?>

    After this our processing pipeline will look like this :

    simple-pipeline

    Now, if we notice that thumbnail workers or the event collectors can’t keep up with the rate of images we are pushing through we can start scaling the pipeline by adding more processes on each layer. ZeroMQ PUSH socket will automatically round-robin between all connected nodes, which makes adding more workers and event collectors simple. After adding more workers our pipeline will look like this :

    scaling-pipeline

    Using ZeroMQ also allows us to create more flexible architectures by adding forwarding devices in the middle, adding request-reply workers etc. So, the last thing to do is to run our pipeline and see the results :

    Let’s create our test image first :

    $ convert magick:rose test.jpg
    

    From the command-line run the thumbnail script :

    $ php thumbnail.php 
    Running thumbnail worker..
    

    In a separate terminal window run the event collector :

    $ php collector.php 
    Waiting for events..
    

    And finally run the client to send the thumbnail request :

    $ php client.php 
    Sent request
    $
    

    If everything went according to the plan you should now see the following output in the event collector window :

    array(4) 
      [0]=>
      string(1) "1"
      [1]=>
      string(56) "/test.jpg"
      [2]=>
      string(62) "/test_50x50.jpg"
      [3]=>
      string(0) ""
    
    

    Happy hacking !

  • Developing A Shader-Based Video Codec

    22 juin 2013, par Multimedia Mike — Outlandish Brainstorms

    Early last month, this thing called ORBX.js was in the news. It ostensibly has something to do with streaming video and codec technology, which naturally catches my interest. The hype was kicked off by Mozilla honcho Brendan Eich when he posted an article asserting that HD video decoding could be entirely performed in JavaScript. We’ve seen this kind of thing before using Broadway– an H.264 decoder implemented entirely in JS. But that exposes some very obvious limitations (notably CPU usage).

    But this new video codec promises 1080p HD playback directly in JavaScript which is a lofty claim. How could it possibly do this ? I got the impression that performance was achieved using WebGL, an extension which allows JavaScript access to accelerated 3D graphics hardware. Browsing through the conversations surrounding the ORBX.js announcement, I found this confirmation from Eich himself :

    You’re right that WebGL does heavy lifting.

    As of this writing, ORBX.js remains some kind of private tech demo. If there were a public demo available, it would necessarily be easy to reverse engineer the downloadable JavaScript decoder.

    But the announcement was enough to make me wonder how it could be possible to create a video codec which effectively leverages 3D hardware.

    Prior Art
    In theorizing about this, it continually occurs to me that I can’t possibly be the first person to attempt to do this (or the ORBX.js people, for that matter). In googling on the matter, I found various forums and Q&A posts where people asked if it were possible to, e.g., accelerate JPEG decoding and presentation using 3D hardware, with no answers. I also found a blog post which describes a plan to use 3D hardware to accelerate VP8 video decoding. It was a project done under the banner of Google’s Summer of Code in 2011, though I’m not sure which open source group mentored the effort. The project did not end up producing the shader-based VP8 codec originally chartered but mentions that “The ‘client side’ of the VP8 VDPAU implementation is working and is currently being reviewed by the libvdpau maintainers.” I’m not sure what that means. Perhaps it includes modifications to the public API that supports VP8, but is waiting for the underlying hardware to actually implement VP8 decoding blocks in hardware.

    What’s So Hard About This ?
    Video decoding is a computationally intensive task. GPUs are known to be really awesome at chewing through computationally intensive tasks. So why aren’t GPUs a natural fit for decoding video codecs ?

    Generally, it boils down to parallelism, or lack of opportunities thereof. GPUs are really good at doing the exact same operations over lots of data at once. The problem is that decoding compressed video usually requires multiple phases that cannot be parallelized, and the individual phases often cannot be parallelized. In strictly mathematical terms, a compressed data stream will need to be decoded by applying a function f(x) over each data element, x0 .. xn. However, the function relies on having applied the function to the previous data element, i.e. :

    f(xn) = f(f(xn-1))
    

    What happens when you try to parallelize such an algorithm ? Temporal rifts in the space/time continuum, if you’re in a Star Trek episode. If you’re in the real world, you’ll get incorrect, unusuable data as the parallel computation is seeded with a bunch of invalid data at multiple points (which is illustrated in some of the pictures in the aforementioned blog post about accelerated VP8).

    Example : JPEG
    Let’s take a very general look at the various stages involved in decoding the ubiquitous JPEG format :


    High level JPEG decoding flow

    What are the opportunities to parallelize these various phases ?

    • Huffman decoding (run length decoding and zig-zag reordering is assumed to be rolled into this phase) : not many opportunities for parallelizing the various Huffman formats out there, including this one. Decoding most Huffman streams is necessarily a sequential operation. I once hypothesized that it would be possible to engineer a codec to achieve some parallelism during the entropy decoding phase, and later found that On2′s VP8 codec employs the scheme. However, such a scheme is unlikely to break down to such a fine level that WebGL would require.
    • Reverse DC prediction : JPEG — and many other codecs — doesn’t store full DC coefficients. It stores differences in successive DC coefficients. Reversing this process can’t be parallelized. See the discussion in the previous section.
    • Dequantize coefficients : This could be very parallelized. It should be noted that software decoders often don’t dequantize all coefficients. Many coefficients are 0 and it’s a waste of a multiplication operation to dequantize. Thus, this phase is sometimes rolled into the Huffman decoding phase.
    • Invert discrete cosine transform : This seems like it could be highly parallelizable. I will be exploring this further in this post.
    • Convert YUV -> RGB for final display : This is a well-established use case for 3D acceleration.

    Crash Course in 3D Shaders and Humility
    So I wanted to see if I could accelerate some parts of JPEG decoding using something called shaders. I made an effort to understand 3D programming and its associated math throughout the 1990s but 3D technology left me behind a very long time ago while I got mixed up in this multimedia stuff. So I plowed through a few books concerning WebGL (thanks to my new Safari Books Online subscription). After I learned enough about WebGL/JS to be dangerous and just enough about shader programming to be absolutely lethal, I set out to try my hand at optimizing IDCT using shaders.

    Here’s my extremely high level (and probably hopelessly naive) view of the modern GPU shader programming model :


    Basic WebGL rendering pipeline

    The WebGL program written in JavaScript drives the show. It sends a set of vertices into the WebGL system and each vertex is processed through a vertex shader. Then, each pixel that falls within a set of vertices is sent through a fragment shader to compute the final pixel attributes (R, G, B, and alpha value). Another consideration is textures : This is data that the program uploads to GPU memory which can be accessed programmatically by the shaders).

    These shaders (vertex and fragment) are key to the GPU’s programmability. How are they programmed ? Using a special C-like shading language. Thought I : “C-like language ? I know C ! I should be able to master this in short order !” So I charged forward with my assumptions and proceeded to get smacked down repeatedly by the overall programming paradigm. I came to recognize this as a variation of the scientific method : Develop a hypothesis– in my case, a mental model of how the system works ; develop an experiment (short program) to prove or disprove the model ; realize something fundamental that I was overlooking ; formulate new hypothesis and repeat.

    First Approach : Vertex Workhorse
    My first pitch goes like this :

    • Upload DCT coefficients to GPU memory in the form of textures
    • Program a vertex mesh that encapsulates 16×16 macroblocks
    • Distribute the IDCT effort among multiple vertex shaders
    • Pass transformed Y, U, and V blocks to fragment shader which will convert the samples to RGB

    So the idea is that decoding of 16×16 macroblocks is parallelized. A macroblock embodies 6 blocks :


    JPEG macroblocks

    It would be nice to process one of these 6 blocks in each vertex. But that means drawing a square with 6 vertices. How do you do that ? I eventually realized that drawing a square with 6 vertices is the recommended method for drawing a square on 3D hardware. Using 2 triangles, each with 3 vertices (0, 1, 2 ; 3, 4, 5) :


    2 triangles make a square

    A vertex shader knows which (x, y) coordinates it has been assigned, so it could figure out which sections of coefficients it needs to access within the textures. But how would a vertex shader know which of the 6 blocks it should process ? Solution : Misappropriate the vertex’s z coordinate. It’s not used for anything else in this case.

    So I set all of that up. Then I hit a new roadblock : How to get the reconstructed Y, U, and V samples transported to the fragment shader ? I have found that communicating between shaders is quite difficult. Texture memory ? WebGL doesn’t allow shaders to write back to texture memory ; shaders can only read it. The standard way to communicate data from a vertex shader to a fragment shader is to declare variables as “varying”. Up until this point, I knew about varying variables but there was something I didn’t quite understand about them and it nagged at me : If 3 different executions of a vertex shader set 3 different values to a varying variable, what value is passed to the fragment shader ?

    It turns out that the varying variable varies, which means that the GPU passes interpolated values to each fragment shader invocation. This completely destroys this idea.

    Second Idea : Vertex Workhorse, Take 2
    The revised pitch is to work around the interpolation issue by just having each vertex shader invocation performs all 6 block transforms. That seems like a lot of redundant. However, I figured out that I can draw a square with only 4 vertices by arranging them in an ‘N’ pattern and asking WebGL to draw a TRIANGLE_STRIP instead of TRIANGLES. Now it’s only doing the 4x the extra work, and not 6x. GPUs are supposed to be great at this type of work, so it shouldn’t matter, right ?

    I wired up an experiment and then ran into a new problem : While I was able to transform a block (or at least pretend to), and load up a varying array (that wouldn’t vary since all vertex shaders wrote the same values) to transmit to the fragment shader, the fragment shader can’t access specific values within the varying block. To clarify, a WebGL shader can use a constant value — or a value that can be evaluated as a constant at compile time — to index into arrays ; a WebGL shader can not compute an index into an array. Per my reading, this is a WebGL security consideration and the limitation may not be present in other OpenGL(-ES) implementations.

    Not Giving Up Yet : Choking The Fragment Shader
    You might want to be sitting down for this pitch :

    • Vertex shader only interpolates texture coordinates to transmit to fragment shader
    • Fragment shader performs IDCT for a single Y sample, U sample, and V sample
    • Fragment shader converts YUV -> RGB

    Seems straightforward enough. However, that step concerning IDCT for Y, U, and V entails a gargantuan number of operations. When computing the IDCT for an entire block of samples, it’s possible to leverage a lot of redundancy in the math which equates to far fewer overall operations. If you absolutely have to compute each sample individually, for an 8×8 block, that requires 64 multiplication/accumulation (MAC) operations per sample. For 3 color planes, and including a few extra multiplications involved in the RGB conversion, that tallies up to about 200 MACs per pixel. Then there’s the fact that this approach means a 4x redundant operations on the color planes.

    It’s crazy, but I just want to see if it can be done. My approach is to pre-compute a pile of IDCT constants in the JavaScript and transmit them to the fragment shader via uniform variables. For a first order optimization, the IDCT constants are formatted as 4-element vectors. This allows computing 16 dot products rather than 64 individual multiplication/addition operations. Ideally, GPU hardware executes the dot products faster (and there is also the possibility of lining these calculations up as matrices).

    I can report that I actually got a sample correctly transformed using this approach. Just one sample, through. Then I ran into some new problems :

    Problem #1 : Computing sample #1 vs. sample #0 requires a different table of 64 IDCT constants. Okay, so create a long table of 64 * 64 IDCT constants. However, this suffers from the same problem as seen in the previous approach : I can’t dynamically compute the index into this array. What’s the alternative ? Maintain 64 separate named arrays and implement 64 branches, when branching of any kind is ill-advised in shader programming to begin with ? I started to go down this path until I ran into…

    Problem #2 : Shaders can only be so large. 64 * 64 floats (4 bytes each) requires 16 kbytes of data and this well exceeds the amount of shader storage that I can assume is allowed. That brings this path of exploration to a screeching halt.

    Further Brainstorming
    I suppose I could forgo pre-computing the constants and directly compute the IDCT for each sample which would entail lots more multiplications as well as 128 cosine calculations per sample (384 considering all 3 color planes). I’m a little stuck with the transform idea right now. Maybe there are some other transforms I could try.

    Another idea would be vector quantization. What little ORBX.js literature is available indicates that there is a method to allow real-time streaming but that it requires GPU assistance to yield enough horsepower to make it feasible. When I think of such severe asymmetry between compression and decompression, my mind drifts towards VQ algorithms. As I come to understand the benefits and limitations of GPU acceleration, I think I can envision a way that something similar to SVQ1, with its copious, hierarchical vector tables stored as textures, could be implemented using shaders.

    So far, this all pertains to intra-coded video frames. What about opportunities for inter-coded frames ? The only approach that I can envision here is to use WebGL’s readPixels() function to fetch the rasterized frame out of the GPU, and then upload it again as a new texture which a new frame processing pipeline could reference. Whether this idea is plausible would require some profiling.

    Using interframes in such a manner seems to imply that the entire codec would need to operate in RGB space and not YUV.

    Conclusions
    The people behind ORBX.js have apparently figured out a way to create a shader-based video codec. I have yet to even begin to reason out a plausible approach. However, I’m glad I did this exercise since I have finally broken through my ignorance regarding modern GPU shader programming. It’s nice to have a topic like multimedia that allows me a jumping-off point to explore other areas.