
Recherche avancée
Autres articles (68)
-
MediaSPIP v0.2
21 juin 2013, parMediaSPIP 0.2 est la première version de MediaSPIP stable.
Sa date de sortie officielle est le 21 juin 2013 et est annoncée ici.
Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
Comme pour la version précédente, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...) -
Mise à disposition des fichiers
14 avril 2011, parPar défaut, lors de son initialisation, MediaSPIP ne permet pas aux visiteurs de télécharger les fichiers qu’ils soient originaux ou le résultat de leur transformation ou encodage. Il permet uniquement de les visualiser.
Cependant, il est possible et facile d’autoriser les visiteurs à avoir accès à ces documents et ce sous différentes formes.
Tout cela se passe dans la page de configuration du squelette. Il vous faut aller dans l’espace d’administration du canal, et choisir dans la navigation (...) -
Librairies et logiciels spécifiques aux médias
10 décembre 2010, parPour un fonctionnement correct et optimal, plusieurs choses sont à prendre en considération.
Il est important, après avoir installé apache2, mysql et php5, d’installer d’autres logiciels nécessaires dont les installations sont décrites dans les liens afférants. Un ensemble de librairies multimedias (x264, libtheora, libvpx) utilisées pour l’encodage et le décodage des vidéos et sons afin de supporter le plus grand nombre de fichiers possibles. Cf. : ce tutoriel ; FFMpeg avec le maximum de décodeurs et (...)
Sur d’autres sites (13697)
-
Developing A Shader-Based Video Codec
22 juin 2013, par Multimedia Mike — Outlandish BrainstormsEarly last month, this thing called ORBX.js was in the news. It ostensibly has something to do with streaming video and codec technology, which naturally catches my interest. The hype was kicked off by Mozilla honcho Brendan Eich when he posted an article asserting that HD video decoding could be entirely performed in JavaScript. We’ve seen this kind of thing before using Broadway– an H.264 decoder implemented entirely in JS. But that exposes some very obvious limitations (notably CPU usage).
But this new video codec promises 1080p HD playback directly in JavaScript which is a lofty claim. How could it possibly do this ? I got the impression that performance was achieved using WebGL, an extension which allows JavaScript access to accelerated 3D graphics hardware. Browsing through the conversations surrounding the ORBX.js announcement, I found this confirmation from Eich himself :
You’re right that WebGL does heavy lifting.
As of this writing, ORBX.js remains some kind of private tech demo. If there were a public demo available, it would necessarily be easy to reverse engineer the downloadable JavaScript decoder.
But the announcement was enough to make me wonder how it could be possible to create a video codec which effectively leverages 3D hardware.
Prior Art
In theorizing about this, it continually occurs to me that I can’t possibly be the first person to attempt to do this (or the ORBX.js people, for that matter). In googling on the matter, I found various forums and Q&A posts where people asked if it were possible to, e.g., accelerate JPEG decoding and presentation using 3D hardware, with no answers. I also found a blog post which describes a plan to use 3D hardware to accelerate VP8 video decoding. It was a project done under the banner of Google’s Summer of Code in 2011, though I’m not sure which open source group mentored the effort. The project did not end up producing the shader-based VP8 codec originally chartered but mentions that “The ‘client side’ of the VP8 VDPAU implementation is working and is currently being reviewed by the libvdpau maintainers.” I’m not sure what that means. Perhaps it includes modifications to the public API that supports VP8, but is waiting for the underlying hardware to actually implement VP8 decoding blocks in hardware.What’s So Hard About This ?
Video decoding is a computationally intensive task. GPUs are known to be really awesome at chewing through computationally intensive tasks. So why aren’t GPUs a natural fit for decoding video codecs ?Generally, it boils down to parallelism, or lack of opportunities thereof. GPUs are really good at doing the exact same operations over lots of data at once. The problem is that decoding compressed video usually requires multiple phases that cannot be parallelized, and the individual phases often cannot be parallelized. In strictly mathematical terms, a compressed data stream will need to be decoded by applying a function f(x) over each data element, x0 .. xn. However, the function relies on having applied the function to the previous data element, i.e. :
f(xn) = f(f(xn-1))
What happens when you try to parallelize such an algorithm ? Temporal rifts in the space/time continuum, if you’re in a Star Trek episode. If you’re in the real world, you’ll get incorrect, unusuable data as the parallel computation is seeded with a bunch of invalid data at multiple points (which is illustrated in some of the pictures in the aforementioned blog post about accelerated VP8).
Example : JPEG
Let’s take a very general look at the various stages involved in decoding the ubiquitous JPEG format :
What are the opportunities to parallelize these various phases ?
- Huffman decoding (run length decoding and zig-zag reordering is assumed to be rolled into this phase) : not many opportunities for parallelizing the various Huffman formats out there, including this one. Decoding most Huffman streams is necessarily a sequential operation. I once hypothesized that it would be possible to engineer a codec to achieve some parallelism during the entropy decoding phase, and later found that On2′s VP8 codec employs the scheme. However, such a scheme is unlikely to break down to such a fine level that WebGL would require.
- Reverse DC prediction : JPEG — and many other codecs — doesn’t store full DC coefficients. It stores differences in successive DC coefficients. Reversing this process can’t be parallelized. See the discussion in the previous section.
- Dequantize coefficients : This could be very parallelized. It should be noted that software decoders often don’t dequantize all coefficients. Many coefficients are 0 and it’s a waste of a multiplication operation to dequantize. Thus, this phase is sometimes rolled into the Huffman decoding phase.
- Invert discrete cosine transform : This seems like it could be highly parallelizable. I will be exploring this further in this post.
- Convert YUV -> RGB for final display : This is a well-established use case for 3D acceleration.
Crash Course in 3D Shaders and Humility
So I wanted to see if I could accelerate some parts of JPEG decoding using something called shaders. I made an effort to understand 3D programming and its associated math throughout the 1990s but 3D technology left me behind a very long time ago while I got mixed up in this multimedia stuff. So I plowed through a few books concerning WebGL (thanks to my new Safari Books Online subscription). After I learned enough about WebGL/JS to be dangerous and just enough about shader programming to be absolutely lethal, I set out to try my hand at optimizing IDCT using shaders.Here’s my extremely high level (and probably hopelessly naive) view of the modern GPU shader programming model :
The WebGL program written in JavaScript drives the show. It sends a set of vertices into the WebGL system and each vertex is processed through a vertex shader. Then, each pixel that falls within a set of vertices is sent through a fragment shader to compute the final pixel attributes (R, G, B, and alpha value). Another consideration is textures : This is data that the program uploads to GPU memory which can be accessed programmatically by the shaders).
These shaders (vertex and fragment) are key to the GPU’s programmability. How are they programmed ? Using a special C-like shading language. Thought I : “C-like language ? I know C ! I should be able to master this in short order !” So I charged forward with my assumptions and proceeded to get smacked down repeatedly by the overall programming paradigm. I came to recognize this as a variation of the scientific method : Develop a hypothesis– in my case, a mental model of how the system works ; develop an experiment (short program) to prove or disprove the model ; realize something fundamental that I was overlooking ; formulate new hypothesis and repeat.
First Approach : Vertex Workhorse
My first pitch goes like this :- Upload DCT coefficients to GPU memory in the form of textures
- Program a vertex mesh that encapsulates 16×16 macroblocks
- Distribute the IDCT effort among multiple vertex shaders
- Pass transformed Y, U, and V blocks to fragment shader which will convert the samples to RGB
So the idea is that decoding of 16×16 macroblocks is parallelized. A macroblock embodies 6 blocks :
It would be nice to process one of these 6 blocks in each vertex. But that means drawing a square with 6 vertices. How do you do that ? I eventually realized that drawing a square with 6 vertices is the recommended method for drawing a square on 3D hardware. Using 2 triangles, each with 3 vertices (0, 1, 2 ; 3, 4, 5) :
A vertex shader knows which (x, y) coordinates it has been assigned, so it could figure out which sections of coefficients it needs to access within the textures. But how would a vertex shader know which of the 6 blocks it should process ? Solution : Misappropriate the vertex’s z coordinate. It’s not used for anything else in this case.
So I set all of that up. Then I hit a new roadblock : How to get the reconstructed Y, U, and V samples transported to the fragment shader ? I have found that communicating between shaders is quite difficult. Texture memory ? WebGL doesn’t allow shaders to write back to texture memory ; shaders can only read it. The standard way to communicate data from a vertex shader to a fragment shader is to declare variables as “varying”. Up until this point, I knew about varying variables but there was something I didn’t quite understand about them and it nagged at me : If 3 different executions of a vertex shader set 3 different values to a varying variable, what value is passed to the fragment shader ?
It turns out that the varying variable varies, which means that the GPU passes interpolated values to each fragment shader invocation. This completely destroys this idea.
Second Idea : Vertex Workhorse, Take 2
The revised pitch is to work around the interpolation issue by just having each vertex shader invocation performs all 6 block transforms. That seems like a lot of redundant. However, I figured out that I can draw a square with only 4 vertices by arranging them in an ‘N’ pattern and asking WebGL to draw a TRIANGLE_STRIP instead of TRIANGLES. Now it’s only doing the 4x the extra work, and not 6x. GPUs are supposed to be great at this type of work, so it shouldn’t matter, right ?I wired up an experiment and then ran into a new problem : While I was able to transform a block (or at least pretend to), and load up a varying array (that wouldn’t vary since all vertex shaders wrote the same values) to transmit to the fragment shader, the fragment shader can’t access specific values within the varying block. To clarify, a WebGL shader can use a constant value — or a value that can be evaluated as a constant at compile time — to index into arrays ; a WebGL shader can not compute an index into an array. Per my reading, this is a WebGL security consideration and the limitation may not be present in other OpenGL(-ES) implementations.
Not Giving Up Yet : Choking The Fragment Shader
You might want to be sitting down for this pitch :- Vertex shader only interpolates texture coordinates to transmit to fragment shader
- Fragment shader performs IDCT for a single Y sample, U sample, and V sample
- Fragment shader converts YUV -> RGB
Seems straightforward enough. However, that step concerning IDCT for Y, U, and V entails a gargantuan number of operations. When computing the IDCT for an entire block of samples, it’s possible to leverage a lot of redundancy in the math which equates to far fewer overall operations. If you absolutely have to compute each sample individually, for an 8×8 block, that requires 64 multiplication/accumulation (MAC) operations per sample. For 3 color planes, and including a few extra multiplications involved in the RGB conversion, that tallies up to about 200 MACs per pixel. Then there’s the fact that this approach means a 4x redundant operations on the color planes.
It’s crazy, but I just want to see if it can be done. My approach is to pre-compute a pile of IDCT constants in the JavaScript and transmit them to the fragment shader via uniform variables. For a first order optimization, the IDCT constants are formatted as 4-element vectors. This allows computing 16 dot products rather than 64 individual multiplication/addition operations. Ideally, GPU hardware executes the dot products faster (and there is also the possibility of lining these calculations up as matrices).
I can report that I actually got a sample correctly transformed using this approach. Just one sample, through. Then I ran into some new problems :
Problem #1 : Computing sample #1 vs. sample #0 requires a different table of 64 IDCT constants. Okay, so create a long table of 64 * 64 IDCT constants. However, this suffers from the same problem as seen in the previous approach : I can’t dynamically compute the index into this array. What’s the alternative ? Maintain 64 separate named arrays and implement 64 branches, when branching of any kind is ill-advised in shader programming to begin with ? I started to go down this path until I ran into…
Problem #2 : Shaders can only be so large. 64 * 64 floats (4 bytes each) requires 16 kbytes of data and this well exceeds the amount of shader storage that I can assume is allowed. That brings this path of exploration to a screeching halt.
Further Brainstorming
I suppose I could forgo pre-computing the constants and directly compute the IDCT for each sample which would entail lots more multiplications as well as 128 cosine calculations per sample (384 considering all 3 color planes). I’m a little stuck with the transform idea right now. Maybe there are some other transforms I could try.Another idea would be vector quantization. What little ORBX.js literature is available indicates that there is a method to allow real-time streaming but that it requires GPU assistance to yield enough horsepower to make it feasible. When I think of such severe asymmetry between compression and decompression, my mind drifts towards VQ algorithms. As I come to understand the benefits and limitations of GPU acceleration, I think I can envision a way that something similar to SVQ1, with its copious, hierarchical vector tables stored as textures, could be implemented using shaders.
So far, this all pertains to intra-coded video frames. What about opportunities for inter-coded frames ? The only approach that I can envision here is to use WebGL’s readPixels() function to fetch the rasterized frame out of the GPU, and then upload it again as a new texture which a new frame processing pipeline could reference. Whether this idea is plausible would require some profiling.
Using interframes in such a manner seems to imply that the entire codec would need to operate in RGB space and not YUV.
Conclusions
The people behind ORBX.js have apparently figured out a way to create a shader-based video codec. I have yet to even begin to reason out a plausible approach. However, I’m glad I did this exercise since I have finally broken through my ignorance regarding modern GPU shader programming. It’s nice to have a topic like multimedia that allows me a jumping-off point to explore other areas. -
Things I Have Learned About Emscripten
1er septembre 2015, par Multimedia Mike — Cirrus Retro3 years ago, I released my Game Music Appreciation project, a website with a ludicrously uninspired title which allowed users a relatively frictionless method to experience a range of specialized music files related to old video games. However, the site required use of a special Chrome plugin. Ever since that initial release, my #1 most requested feature has been for a pure JavaScript version of the music player.
“Impossible !” I exclaimed. “There’s no way JS could ever run fast enough to run these CPU emulators and audio synthesizers in real time, and allow for the visualization that I demand !” Well, I’m pleased to report that I have proved me wrong. I recently quietly launched a new site with what I hope is a catchier title, meant to evoke a cloud-based retro-music-as-a-service product : Cirrus Retro. Right now, it’s basically the same as the old site, but without the wonky Chrome-specific technology.
Along the way, I’ve learned a few things about using Emscripten that I thought might be useful to share with other people who wish to embark on a similar journey. This is geared more towards someone who has a stronger low-level background (such as C/C++) vs. high-level (like JavaScript).
General Goals
Do you want to cross-compile an entire desktop application, one that relies on an extensive GUI toolkit ? That might be difficult (though I believe there is a path for porting qt code directly with Emscripten). Your better wager might be to abstract out the core logic and processes of the program and then create a new web UI to access them.Do you want to compile a game that basically just paints stuff to a 2D canvas ? You’re in luck ! Emscripten has a porting path for SDL. Make a version of your C/C++ software that targets SDL (generally not a tall order) and then compile that with Emscripten.
Do you just want to cross-compile some functionality that lives in a library ? That’s what I’ve done with the Cirrus Retro project. For this, plan to compile the library into a JS file that exports some public functions that other, higher-level, native JS (i.e., JS written by a human and not a computer) will invoke.
Memory Levels
When porting C/C++ software to JavaScript using Emscripten, you have to think on 2 different levels. Or perhaps you need to force JavaScript into a low level C lens, especially if you want to write native JS code that will interact with Emscripten-compiled code. This often means somehow allocating chunks of memory via JS and passing them to the Emscripten-compiled functions. And you wouldn’t believe the type of gymnastics you need to execute to get native JS and Emscripten-compiled JS to cooperate.
“Emscripten : Pointers and Pointers” is the best (and, really, ONLY) explanation I could find for understanding the basic mechanics of this process, at least when I started this journey. However, there’s a mistake in the explanation that left me confused for a little while, and I’m at a loss to contact the author (doesn’t anyone post a simple email address anymore ?).
Per the best of my understanding, Emscripten allocates a large JS array and calls that the memory space that the compiled C/C++ code is allowed to operate in. A pointer in C/C++ code will just be an index into that mighty array. Really, that’s not too far off from how a low-level program process is supposed to view memory– as a flat array.
Eventually, I just learned to cargo-cult my way through the memory allocation process. Here’s the JS code for allocating an Emscripten-compatible byte buffer, taken from my test harness (more on that later) :
var musicBuffer = fs.readFileSync(testSpec[’filename’]) ; var musicBufferBytes = new Uint8Array(musicBuffer) ; var bytesMalloc = player._malloc(musicBufferBytes.length) ; var bytes = new Uint8Array(player.HEAPU8.buffer, bytesMalloc, musicBufferBytes.length) ; bytes.set(new Uint8Array(musicBufferBytes.buffer)) ;
So, read the array of bytes from some input source, create a Uint8Array from the bytes, use the Emscripten _malloc() function to allocate enough bytes from the Emscripten memory array for the input bytes, then create a new array… then copy the bytes…
You know what ? It’s late and I can’t remember how it works exactly, but it does. It has been a few months since I touched that code (been fighting with front-end website tech since then). You write that memory allocation code enough times and it begins to make sense, and then you hope you don’t have to write it too many more times.
Multithreading
You can’t port multithreaded code to JS via Emscripten. JavaScript has no notion of threads ! If you don’t understand the computer science behind this limitation, a more thorough explanation is beyond the scope of this post. But trust me, I’ve thought about it a lot. In fact, the official Emscripten literature states that you should be able to port most any C/C++ code as long as 1) none of the code is proprietary (i.e., all the raw source is available) ; and 2) there are no threads.Yes, I read about the experimental pthreads support added to Emscripten recently. Don’t get too excited ; that won’t be ready and widespread for a long time to come as it relies on a new browser API. In the meantime, figure out how to make your multithreaded C/C++ code run in a single thread if you want it to run in a browser.
Printing Facility
Eventually, getting software to work boils down to debugging, and the most primitive tool in many a programmer’s toolbox is the humble print statement. A print statement allows you to inspect a piece of a program’s state at key junctures. Eventually, when you try to cross-compile C/C++ code to JS using Emscripten, something is not going to work correctly in the generated JS “object code” and you need to understand what. You’ll be pleading for a method of just inspecting one variable deep in the original C/C++ code.I came up with this simple printf-workalike called emprintf() :
#ifndef EMPRINTF_H #define EMPRINTF_H
#include <stdio .h>
#include <stdarg .h>
#include <emscripten .h>#define MAX_MSG_LEN 1000
/* NOTE : Don’t pass format strings that contain single quote (’) or newline
* characters. */
static void emprintf(const char *format, ...)
char msg[MAX_MSG_LEN] ;
char consoleMsg[MAX_MSG_LEN + 16] ;
va_list args ;/* create the string */
va_start(args, format) ;
vsnprintf(msg, MAX_MSG_LEN, format, args) ;
va_end(args) ;/* wrap the string in a console.log(’’) statement */
snprintf(consoleMsg, MAX_MSG_LEN + 16, "console.log(’%s’)", msg) ;/* send the final string to the JavaScript console */
emscripten_run_script(consoleMsg) ;
#endif /* EMPRINTF_H */
Put it in a file called “emprint.h”. Include it into any C/C++ file where you need debugging visibility, use emprintf() as a replacement for printf() and the output will magically show up on the browser’s JavaScript debug console. Heed the comments and don’t put any single quotes or newlines in strings, and keep it under 1000 characters. I didn’t say it was perfect, but it has helped me a lot in my Emscripten adventures.
Optimization Levels
Remember to turn on optimization when compiling. I have empirically found that optimizing for size (-Os) leads to the best performance all around, in addition to having the smallest size. Just be sure to specify some optimization level. If you don’t, the default is -O0 which offers horrible performance when running in JS.Static Compression For HTTP Delivery
JavaScript code compresses pretty efficiently, even after it has been optimized for size using -Os. I routinely see compression ratios between 3.5:1 and 5:1 using gzip.Web servers in this day and age are supposed to be smart enough to detect when a requesting web browser can accept gzip-compressed data and do the compression on the fly. They’re even supposed to be smart enough to cache compressed output so the same content is not recompressed for each request. I would have to set up a series of tests to establish whether either of the foregoing assertions are correct and I can’t be bothered. Instead, I took it into my own hands. The trick is to pre-compress the JS files and then instruct the webserver to serve these files with a ‘Content-Type’ of ‘application/javascript’ and a ‘Content-Encoding’ of ‘gzip’.
- Compress your large Emscripten-build JS files with ‘gzip’ : ‘gzip compiled-code.js’
- Rename them from extension .js.gz to .jsgz
- Tell the webserver to deliver .jsgz files with the correct Content-Type and Content-Encoding headers
To do that last step with Apache, specify these lines :
AddType application/javascript jsgz AddEncoding gzip jsgz
They belong in either a directory’s .htaccess file or in the sitewide configuration (/etc/apache2/mods-available/mime.conf works on my setup).
Build System and Build Time Optimization
Oh goodie, build systems ! I had a very specific manner in which I wanted to build my JS modules using Emscripten. Can I possibly coerce any of the many popular build systems to do this ? It has been a few months since I worked on this problem specifically but I seem to recall that the build systems I tried to used would freak out at the prospect of compiling stuff to a final binary target of .js.I had high hopes for Bazel, which Google released while I was developing Cirrus Retro. Surely, this is software that has been battle-tested in the harshest conditions of one of the most prominent software-developing companies in the world, needing to take into account the most bizarre corner cases and still build efficiently and correctly every time. And I have little doubt that it fulfills the order. Similarly, I’m confident that Google also has a team of no fewer than 100 or so people dedicated to developing and supporting the project within the organization. When you only have, at best, 1-2 hours per night to work on projects like this, you prefer not to fight with such cutting edge technology and after losing 2 or 3 nights trying to make a go of Bazel, I eventually put it aside.
I also tried to use Autotools. It failed horribly for me, mostly for my own carelessness and lack of early-project source control.
After that, it was strictly vanilla makefiles with no real dependency management. But you know what helps in these cases ? ccache ! Or at least, it would if it didn’t fail with Emscripten.
Quick tip : ccache has trouble with LLVM unless you set the CCACHE_CPP2 environment variable (e.g. : “export CCACHE_CPP2=1”). I don’t remember the specifics, but it magically fixes things. Then, the lazy build process becomes “make clean && make”.
Testing
If you have never used Node.js, testing Emscripten-compiled JS code might be a good opportunity to start. I was able to use Node.js to great effect for testing the individually-compiled music player modules, wiring up a series of invocations using Python for a broader test suite (wouldn’t want to go too deep down the JS rabbit hole, after all).Be advised that Node.js doesn’t enjoy the same kind of JIT optimizations that the browser engines leverage. Thus, in the case of time critical code like, say, an audio synthesis library, the code might not run in real time. But as long as it produces the correct bitwise waveform, that’s good enough for continuous integration.
Also, if you have largely been a low-level programmer for your whole career and are generally unfamiliar with the world of single-threaded, event-driven, callback-oriented programming, you might be in for a bit of a shock. When I wanted to learn how to read the contents of a file in Node.js, this is the first tutorial I found on the matter. I thought the code presented was a parody of bad coding style :
var fs = require("fs") ; var fileName = "foo.txt" ;
fs.exists(fileName, function(exists)
if (exists)
fs.stat(fileName, function(error, stats)
fs.open(fileName, "r", function(error, fd)
var buffer = new Buffer(stats.size) ;fs.read(fd, buffer, 0, buffer.length, null, function(error, bytesRead, buffer)
var data = buffer.toString("utf8", 0, buffer.length) ;console.log(data) ;
fs.close(fd) ;
) ;
) ;
) ;
) ;Apparently, this kind of thing doesn’t raise an eyebrow in the JS world.
Now, I understand and respect the JS programming model. But this was seriously frustrating when I first encountered it because a simple script like the one I was trying to write just has an ordered list of tasks to complete. When it asks for bytes from a file, it really has nothing better to do than to wait for the answer.
Thankfully, it turns out that Node’s fs module includes synchronous versions of the various file access functions. So it’s all good.
Conclusion
I’m sure I missed or underexplained some things. But if other brave souls are interested in dipping their toes in the waters of Emscripten, I hope these tips will come in handy.The post Things I Have Learned About Emscripten first appeared on Breaking Eggs And Making Omelettes.
-
Things I Have Learned About Emscripten
1er septembre 2015, par Multimedia Mike — Cirrus Retro3 years ago, I released my Game Music Appreciation project, a website with a ludicrously uninspired title which allowed users a relatively frictionless method to experience a range of specialized music files related to old video games. However, the site required use of a special Chrome plugin. Ever since that initial release, my #1 most requested feature has been for a pure JavaScript version of the music player.
“Impossible !” I exclaimed. “There’s no way JS could ever run fast enough to run these CPU emulators and audio synthesizers in real time, and allow for the visualization that I demand !” Well, I’m pleased to report that I have proved me wrong. I recently quietly launched a new site with what I hope is a catchier title, meant to evoke a cloud-based retro-music-as-a-service product : Cirrus Retro. Right now, it’s basically the same as the old site, but without the wonky Chrome-specific technology.
Along the way, I’ve learned a few things about using Emscripten that I thought might be useful to share with other people who wish to embark on a similar journey. This is geared more towards someone who has a stronger low-level background (such as C/C++) vs. high-level (like JavaScript).
General Goals
Do you want to cross-compile an entire desktop application, one that relies on an extensive GUI toolkit ? That might be difficult (though I believe there is a path for porting qt code directly with Emscripten). Your better wager might be to abstract out the core logic and processes of the program and then create a new web UI to access them.Do you want to compile a game that basically just paints stuff to a 2D canvas ? You’re in luck ! Emscripten has a porting path for SDL. Make a version of your C/C++ software that targets SDL (generally not a tall order) and then compile that with Emscripten.
Do you just want to cross-compile some functionality that lives in a library ? That’s what I’ve done with the Cirrus Retro project. For this, plan to compile the library into a JS file that exports some public functions that other, higher-level, native JS (i.e., JS written by a human and not a computer) will invoke.
Memory Levels
When porting C/C++ software to JavaScript using Emscripten, you have to think on 2 different levels. Or perhaps you need to force JavaScript into a low level C lens, especially if you want to write native JS code that will interact with Emscripten-compiled code. This often means somehow allocating chunks of memory via JS and passing them to the Emscripten-compiled functions. And you wouldn’t believe the type of gymnastics you need to execute to get native JS and Emscripten-compiled JS to cooperate.
“Emscripten : Pointers and Pointers” is the best (and, really, ONLY) explanation I could find for understanding the basic mechanics of this process, at least when I started this journey. However, there’s a mistake in the explanation that left me confused for a little while, and I’m at a loss to contact the author (doesn’t anyone post a simple email address anymore ?).
Per the best of my understanding, Emscripten allocates a large JS array and calls that the memory space that the compiled C/C++ code is allowed to operate in. A pointer in C/C++ code will just be an index into that mighty array. Really, that’s not too far off from how a low-level program process is supposed to view memory– as a flat array.
Eventually, I just learned to cargo-cult my way through the memory allocation process. Here’s the JS code for allocating an Emscripten-compatible byte buffer, taken from my test harness (more on that later) :
var musicBuffer = fs.readFileSync(testSpec[’filename’]) ; var musicBufferBytes = new Uint8Array(musicBuffer) ; var bytesMalloc = player._malloc(musicBufferBytes.length) ; var bytes = new Uint8Array(player.HEAPU8.buffer, bytesMalloc, musicBufferBytes.length) ; bytes.set(new Uint8Array(musicBufferBytes.buffer)) ;
So, read the array of bytes from some input source, create a Uint8Array from the bytes, use the Emscripten _malloc() function to allocate enough bytes from the Emscripten memory array for the input bytes, then create a new array… then copy the bytes…
You know what ? It’s late and I can’t remember how it works exactly, but it does. It has been a few months since I touched that code (been fighting with front-end website tech since then). You write that memory allocation code enough times and it begins to make sense, and then you hope you don’t have to write it too many more times.
Multithreading
You can’t port multithreaded code to JS via Emscripten. JavaScript has no notion of threads ! If you don’t understand the computer science behind this limitation, a more thorough explanation is beyond the scope of this post. But trust me, I’ve thought about it a lot. In fact, the official Emscripten literature states that you should be able to port most any C/C++ code as long as 1) none of the code is proprietary (i.e., all the raw source is available) ; and 2) there are no threads.Yes, I read about the experimental pthreads support added to Emscripten recently. Don’t get too excited ; that won’t be ready and widespread for a long time to come as it relies on a new browser API. In the meantime, figure out how to make your multithreaded C/C++ code run in a single thread if you want it to run in a browser.
Printing Facility
Eventually, getting software to work boils down to debugging, and the most primitive tool in many a programmer’s toolbox is the humble print statement. A print statement allows you to inspect a piece of a program’s state at key junctures. Eventually, when you try to cross-compile C/C++ code to JS using Emscripten, something is not going to work correctly in the generated JS “object code” and you need to understand what. You’ll be pleading for a method of just inspecting one variable deep in the original C/C++ code.I came up with this simple printf-workalike called emprintf() :
#ifndef EMPRINTF_H #define EMPRINTF_H
#include <stdio .h>
#include <stdarg .h>
#include <emscripten .h>#define MAX_MSG_LEN 1000
/* NOTE : Don’t pass format strings that contain single quote (’) or newline
* characters. */
static void emprintf(const char *format, ...)
char msg[MAX_MSG_LEN] ;
char consoleMsg[MAX_MSG_LEN + 16] ;
va_list args ;/* create the string */
va_start(args, format) ;
vsnprintf(msg, MAX_MSG_LEN, format, args) ;
va_end(args) ;/* wrap the string in a console.log(’’) statement */
snprintf(consoleMsg, MAX_MSG_LEN + 16, "console.log(’%s’)", msg) ;/* send the final string to the JavaScript console */
emscripten_run_script(consoleMsg) ;
#endif /* EMPRINTF_H */
Put it in a file called “emprint.h”. Include it into any C/C++ file where you need debugging visibility, use emprintf() as a replacement for printf() and the output will magically show up on the browser’s JavaScript debug console. Heed the comments and don’t put any single quotes or newlines in strings, and keep it under 1000 characters. I didn’t say it was perfect, but it has helped me a lot in my Emscripten adventures.
Optimization Levels
Remember to turn on optimization when compiling. I have empirically found that optimizing for size (-Os) leads to the best performance all around, in addition to having the smallest size. Just be sure to specify some optimization level. If you don’t, the default is -O0 which offers horrible performance when running in JS.Static Compression For HTTP Delivery
JavaScript code compresses pretty efficiently, even after it has been optimized for size using -Os. I routinely see compression ratios between 3.5:1 and 5:1 using gzip.Web servers in this day and age are supposed to be smart enough to detect when a requesting web browser can accept gzip-compressed data and do the compression on the fly. They’re even supposed to be smart enough to cache compressed output so the same content is not recompressed for each request. I would have to set up a series of tests to establish whether either of the foregoing assertions are correct and I can’t be bothered. Instead, I took it into my own hands. The trick is to pre-compress the JS files and then instruct the webserver to serve these files with a ‘Content-Type’ of ‘application/javascript’ and a ‘Content-Encoding’ of ‘gzip’.
- Compress your large Emscripten-build JS files with ‘gzip’ : ‘gzip compiled-code.js’
- Rename them from extension .js.gz to .jsgz
- Tell the webserver to deliver .jsgz files with the correct Content-Type and Content-Encoding headers
To do that last step with Apache, specify these lines :
AddType application/javascript jsgz AddEncoding gzip jsgz
They belong in either a directory’s .htaccess file or in the sitewide configuration (/etc/apache2/mods-available/mime.conf works on my setup).
Build System and Build Time Optimization
Oh goodie, build systems ! I had a very specific manner in which I wanted to build my JS modules using Emscripten. Can I possibly coerce any of the many popular build systems to do this ? It has been a few months since I worked on this problem specifically but I seem to recall that the build systems I tried to used would freak out at the prospect of compiling stuff to a final binary target of .js.I had high hopes for Bazel, which Google released while I was developing Cirrus Retro. Surely, this is software that has been battle-tested in the harshest conditions of one of the most prominent software-developing companies in the world, needing to take into account the most bizarre corner cases and still build efficiently and correctly every time. And I have little doubt that it fulfills the order. Similarly, I’m confident that Google also has a team of no fewer than 100 or so people dedicated to developing and supporting the project within the organization. When you only have, at best, 1-2 hours per night to work on projects like this, you prefer not to fight with such cutting edge technology and after losing 2 or 3 nights trying to make a go of Bazel, I eventually put it aside.
I also tried to use Autotools. It failed horribly for me, mostly for my own carelessness and lack of early-project source control.
After that, it was strictly vanilla makefiles with no real dependency management. But you know what helps in these cases ? ccache ! Or at least, it would if it didn’t fail with Emscripten.
Quick tip : ccache has trouble with LLVM unless you set the CCACHE_CPP2 environment variable (e.g. : “export CCACHE_CPP2=1”). I don’t remember the specifics, but it magically fixes things. Then, the lazy build process becomes “make clean && make”.
Testing
If you have never used Node.js, testing Emscripten-compiled JS code might be a good opportunity to start. I was able to use Node.js to great effect for testing the individually-compiled music player modules, wiring up a series of invocations using Python for a broader test suite (wouldn’t want to go too deep down the JS rabbit hole, after all).Be advised that Node.js doesn’t enjoy the same kind of JIT optimizations that the browser engines leverage. Thus, in the case of time critical code like, say, an audio synthesis library, the code might not run in real time. But as long as it produces the correct bitwise waveform, that’s good enough for continuous integration.
Also, if you have largely been a low-level programmer for your whole career and are generally unfamiliar with the world of single-threaded, event-driven, callback-oriented programming, you might be in for a bit of a shock. When I wanted to learn how to read the contents of a file in Node.js, this is the first tutorial I found on the matter. I thought the code presented was a parody of bad coding style :
var fs = require("fs") ; var fileName = "foo.txt" ;
fs.exists(fileName, function(exists)
if (exists)
fs.stat(fileName, function(error, stats)
fs.open(fileName, "r", function(error, fd)
var buffer = new Buffer(stats.size) ;fs.read(fd, buffer, 0, buffer.length, null, function(error, bytesRead, buffer)
var data = buffer.toString("utf8", 0, buffer.length) ;console.log(data) ;
fs.close(fd) ;
) ;
) ;
) ;
) ;Apparently, this kind of thing doesn’t raise an eyebrow in the JS world.
Now, I understand and respect the JS programming model. But this was seriously frustrating when I first encountered it because a simple script like the one I was trying to write just has an ordered list of tasks to complete. When it asks for bytes from a file, it really has nothing better to do than to wait for the answer.
Thankfully, it turns out that Node’s fs module includes synchronous versions of the various file access functions. So it’s all good.
Conclusion
I’m sure I missed or underexplained some things. But if other brave souls are interested in dipping their toes in the waters of Emscripten, I hope these tips will come in handy.