Recherche avancée

Médias (1)

Mot : - Tags -/bug

Autres articles (30)

  • Les vidéos

    21 avril 2011, par

    Comme les documents de type "audio", Mediaspip affiche dans la mesure du possible les vidéos grâce à la balise html5 .
    Un des inconvénients de cette balise est qu’elle n’est pas reconnue correctement par certains navigateurs (Internet Explorer pour ne pas le nommer) et que chaque navigateur ne gère en natif que certains formats de vidéos.
    Son avantage principal quant à lui est de bénéficier de la prise en charge native de vidéos dans les navigateur et donc de se passer de l’utilisation de Flash et (...)

  • Websites made ​​with MediaSPIP

    2 mai 2011, par

    This page lists some websites based on MediaSPIP.

  • Possibilité de déploiement en ferme

    12 avril 2011, par

    MediaSPIP peut être installé comme une ferme, avec un seul "noyau" hébergé sur un serveur dédié et utilisé par une multitude de sites différents.
    Cela permet, par exemple : de pouvoir partager les frais de mise en œuvre entre plusieurs projets / individus ; de pouvoir déployer rapidement une multitude de sites uniques ; d’éviter d’avoir à mettre l’ensemble des créations dans un fourre-tout numérique comme c’est le cas pour les grandes plate-formes tout public disséminées sur le (...)

Sur d’autres sites (5019)

  • Investigating Steam for Linux

    1er mars 2013, par Multimedia Mike — Game Hacking

    Valve recently released the final, public version of their Steam client for Linux, and the Linux world rejoiced. At least, it probably did. The announcement was 2 weeks ago on Valentine’s Day and I had other things on my mind, so I missed any fanfare. When framed in this manner, the announcement timing becomes suspect– it’s as though Linux enthusiasts would have plenty of time that day or something.


    Valve Steam logo

    Taming the Frontier
    Speculation about a Linux Steam client had been kicking around for nearly as long as Steam has existed. However, sometime last year, the rumors became more substantive.

    I naturally wondered how to port something like Steam to Linux. I have some experience with trying to make a necessarily binary-only program that runs on Linux. I’m fairly well-versed in the assorted technical challenges that one might face when attempting such a feat. Because of this, whenever I hear rumors that a company might be entertaining the notion of porting a major piece of proprietary software to Linux, my instinctive reflex is, “What ?! Why, you fools ?! Save yourselves !”

    At least, that’s how it used to be. The proposal of developing a proprietary binary for Linux has been rendered considerably less insane by a few developments, for example :

    1. The rise of Ubuntu Linux as a quasi de facto standard for desktop Linux computing
    2. The increasing homogeneity in personal desktop computing technology

    What I would like to know is how the Steam client runs on Linux. Does it rely on any libraries being present on the system ? Or does it bring its own ? The latter is a trick that proprietary programs can use– transport all of the shared libraries that the main program binary depends upon, install them someplace out of the way on the filesystem, probably in /opt, and then make the main program a shell script which sets a preload path to rely on the known quantity libraries instead of the copies already on the system.

    Downloading and Installing the Client
    For this exercise, I installed x86_64 desktop Ubuntu 12.04 Linux on a l33t gaming rig that was totally top of the line about 5 years ago, and that someone didn’t want anymore and handed down to me recently. So it should be ideal for this project.

    At first, I was blown away– the Linux client is in a .deb package that is less than 2 MB large. I unpacked the steam.deb file and found a bunch of support libraries — mostly X11 and standard C/C++ runtimes. Just as I suspected. Still, I can’t believe how small the thing is. However, my amazement quickly abated when I actually ran Steam and saw this :


    Steam Linux Client -- initial update

    So it turns out steam.db is just the installer program which immediately proceeds to download an additional 160+ MB of data. So there’s actually a lot more information to possibly sift through.

    Another component of the installation is to basically run a big ‘apt-get install’ command to make sure a bunch of required packages are installed :


    Steam Linux Client -- install system packages

    After all these installation steps, the client was ready to run. However, whenever I tried to do so, I got this dialog which would cause Steam to close when the dialog was dismissed.


    Steam Linux Client -- Upgrade NVIDIA drivers

    Not a huge deal ; later NVIDIA drivers are fairly straightforward to install on Ubuntu Linux. After a few minutes of downloading, installing and restarting X, Steam ran with minimal complaint (it still had some issue regarding the video drivers but didn’t seem to consider it a deal-breaker).

    Using Steam on Linux

    So here’s Steam running on Linux :


    Steam Linux Client -- main screen

    If you have experience with using Steam on Windows or Mac, you might observe that it looks exactly the same. I don’t have a very expansive library of games (I only started using Steam because purchasing a few computer components a few years ago entitled me to some free Steam downloads of some of the games on the list in the screenshot). I didn’t really expect any of the games to have Linux versions yet, but it turns out that the indie darling FTL : Faster Than Light has been ported to Linux. FTL was a much-heralded Kickstarter success story and sounded like something I wanted to support. I purchased this from Steam shortly after its release last year and was able to download the Linux version at no additional cost with a single click.

    It runs natively on Linux (note the Ubuntu desktop window decorations) :


    FTL game running on Linux through Steam

    You might notice from the main Steam client that, despite purchasing FTL about a 1/2 year ago and starting it up at least a 1/2 dozen times, I haven’t really invested a whole lot of time into it. I only managed to get about 2 minutes further this time :


    A few more minutes in FTL

    What can I say ? This game just bores me to tears. It’s frustrating because I know that this is one of the cool games that all real gamers are supposed to like, but I practically catch myself nodding off every time I try to run through the tutorial. It’s strange to think that I’ve invested far more time into games that offer considerably less stimulation. That’s probably because I had far more free time compared to gaming options during those times.

    But that’s neither here nor there. We’ll file this under “games that aren’t for me.” I’m glad that people like FTL and a little indie underdog has met with such success. And I’m pleased that Steam on Linux works. It’s native and the games are also native, which is all quite laudable (there was speculation that everything would just be running on top of a Wine layer).

    Deeper Analysis
    So I set out wondering how Steam was able to create a proprietary program that would satisfy a large enough cross-section of Linux users (i.e., on different platforms and distros). Answer : well, they didn’t, per the stated requirements. The installation is only tuned to work on Ubuntu 12.04. However, it works on both 32- and 64-bit platforms, the only 2 desktop CPU platforms that matter these days (unless ARM somehow makes inroads on the desktop). The Steam client is quite clearly an x86_32 binary– look at the terminal screenshot above and observe that it’s downloading all :i386 support libraries.

    The file /usr/bin/steam isn’t a binary but a launcher shell script (something you’ll also see if you investigate /usr/bin/firefox on a Linux system). Here’s an interesting tidbit :

    function detect_platform()
    
      # Maybe be smarter someday
      # Right now this is the only platform we have a bootstrap for, so hard-code it.
      echo ubuntu12_32
    
    

    I wager that it’s possible to get Steam running on other distributions, it probably just takes a little more effort (assuming that Steam doesn’t put too much effort into thwarting such attempts).

    As for the FTL game, it comes with binaries and libraries for both x86_32 and x86_64. So, good work to the dev team for creating and testing both versions. FTL also distributes versions of the libraries it expects to work with.

    I suspect that the Steam client overall is largely a WWW rendering engine underneath the covers. That would help explain how Valve is able to achieve such a consistent look and feel, not only across OS platforms, but also through a web browser. When I browse the Steam store through Google Chrome, it looks and feels exactly like the native desktop client. When I first thought of how someone could port Steam to Linux, I immediately wondered about how they would do the UI.

    A little Googling for “steam uses webkit” (just a hunch) confirms my hypothesis.

  • Processing Big Data Problems

    8 janvier 2011, par Multimedia Mike — Big Data

    I’m becoming more interested in big data problems, i.e., extracting useful information out of absurdly sized sets of input data. I know it’s a growing field and there is a lot to read on the subject. But you know how I roll— just think of a problem to solve and dive right in.

    Here’s how my adventure unfolded.

    The Corpus
    I need to run a command line program on a set of files I have collected. This corpus is on the order of 350,000 files. The files range from 7 bytes to 175 MB. Combined, they occupy around 164 GB of storage space.

    Oh, and said storage space resides on an external, USB 2.0-connected hard drive. Stop laughing.

    A file is named according to the SHA-1 hash of its data. The files are organized in a directory hierarchy according to the first 6 hex digits of the SHA-1 hash (e.g., a file named a4d5832f... is stored in a4/d5/83/a4d5832f...). All of this file hash, path, and size information is stored in an SQLite database.

    First Pass
    I wrote a Python script that read all the filenames from the database, fed them into a pool of worker processes using Python’s multiprocessing module, and wrote some resulting data for each file back to the SQLite database. My Eee PC has a single-core, hyperthreaded Atom which presents 2 CPUs to the system. Thus, 2 worker threads crunched the corpus. It took awhile. It took somewhere on the order of 9 or 10 or maybe even 12 hours. It took long enough that I’m in no hurry to re-run the test and get more precise numbers.

    At least I extracted my initial set of data from the corpus. Or did I ?

    Think About The Future

    A few days later, I went back to revisit the data only to notice that the SQLite database was corrupted. To add insult to that bit of injury, the script I had written to process the data was also completely corrupted (overwritten with something unrelated to Python code). BTW, this is was on a RAID brick configured for redundancy. So that’s strike 3 in my personal dealings with RAID technology.

    I moved the corpus to a different external drive and also verified the files after writing (easy to do since I already had the SHA-1 hashes on record).

    The corrupted script was pretty simple to rewrite, even a little better than before. Then I got to re-run it. However, this run was on a faster machine, a hyperthreaded, quad-core beast that exposes 8 CPUs to the system. The reason I wasn’t too concerned about the poor performance with my Eee PC is that I knew I was going to be able to run in on this monster later.

    So I let the rewritten script rip. The script gave me little updates regarding its progress. As it did so, I ran some rough calculations and realized that it wasn’t predicted to finish much sooner than it would have if I were running it on the Eee PC.

    Limiting Factors
    It had been suggested to me that I/O bandwidth of the external USB drive might be a limiting factor. This is when I started to take that idea very seriously.

    The first idea I had was to move the SQLite database to a different drive. The script records data to the database for every file processed, though it only commits once every 100 UPDATEs, so at least it’s not constantly syncing the disc. I ran before and after tests with a small subset of the corpus and noticed a substantial speedup thanks to this policy chance.

    Then I remembered hearing something about "atime" which is access time. Linux filesystems, per default, record the time that a file was last accessed. You can watch this in action by running 'stat <file> ; cat <file> > /dev/null ; stat <file>' and observe that the "Access" field has been updated to NOW(). This also means that every single file that gets read from the external drive still causes an additional write. To avoid this, I started mounting the external drive with '-o noatime' which instructs Linux not to record "last accessed" time for files.

    On the limited subset test, this more than doubled script performance. I then wondered about mounting the external drive as read-only. This had the same performance as noatime. I thought about using both options together but verified that access times are not updated for a read-only filesystem.

    A Note On Profiling
    Once you start accessing files in Linux, those files start getting cached in RAM. Thus, if you profile, say, reading a gigabyte file from a disk and get 31 MB/sec, and then repeat the same test, you’re likely to see the test complete instantaneously. That’s because the file is already sitting in memory, cached. This is useful in general application use, but not if you’re trying to profile disk performance.

    Thus, in between runs, do (as root) 'sync; echo 3 > /proc/sys/vm/drop_caches' in order to wipe caches (explained here).

    Even Better ?
    I re-ran the test using these little improvements. Now it takes somewhere around 5 or 6 hours to run.

    I contrived an artificially large file on the external drive and did some 'dd' tests to measure what the drive could really do. The drive consistently measured a bit over 31 MB/sec. If I could read and process the data at 30 MB/sec, the script would be done in about 95 minutes.

    But it’s probably rather unreasonable to expect that kind of transfer rate for lots of smaller files scattered around a filesystem. However, it can’t be that helpful to have 8 different processes constantly asking the HD for 8 different files at any one time.

    So I wrote a script called stream-corpus.py which simply fetched all the filenames from the database and loaded the contents of each in turn, leaving the data to be garbage-collected at Python’s leisure. This test completed in 174 minutes, just shy of 3 hours. I computed an average read speed of around 17 MB/sec.

    Single-Reader Script
    I began to theorize that if I only have one thread reading, performance should improve greatly. To test this hypothesis without having to do a lot of extra work, I cleared the caches and ran stream-corpus.py until 'top' reported that about half of the real memory had been filled with data. Then I let the main processing script loose on the data. As both scripts were using sorted lists of files, they iterated over the filenames in the same order.

    Result : The processing script tore through the files that had obviously been cached thanks to stream-corpus.py, degrading drastically once it had caught up to the streaming script.

    Thus, I was incented to reorganize the processing script just slightly. Now, there is a reader thread which reads each file and stuffs the name of the file into an IPC queue that one of the worker threads can pick up and process. Note that no file data is exchanged between threads. No need— the operating system is already implicitly holding onto the file data, waiting in case someone asks for it again before something needs that bit of RAM. Technically, this approach accesses each file multiple times. But it makes little practical difference thanks to caching.

    Result : About 183 minutes to process the complete corpus (which works out to a little over 16 MB/sec).

    Why Multiprocess
    Is it even worthwhile to bother multithreading this operation ? Monitoring the whole operation via 'top', most instances of the processing script are barely using any CPU time. Indeed, it’s likely that only one of the worker threads is doing any work most of the time, pulling a file out of the IPC queue as soon the reader thread triggers its load into cache. Right now, the processing is usually pretty quick. There are cases where the processing (external program) might hang (one of the reasons I’m running this project is to find those cases) ; the multiprocessing architecture at least allows other processes to take over until a hanging process is timed out and killed by its monitoring process.

    Further, the processing is pretty simple now but is likely to get more intensive in future iterations. Plus, there’s the possibility that I might move everything onto a more appropriately-connected storage medium which should help alleviate the bottleneck bravely battled in this post.

    There’s also the theoretical possibility that the reader thread could read too far ahead of the processing threads. Obviously, that’s not too much of an issue in the current setup. But to guard against it, the processes could share a variable that tracks the total number of bytes that have been processed. The reader thread adds filesizes to the count while the processing threads subtract file sizes. The reader thread would delay reading more if the number got above a certain threshold.

    Leftovers
    I wondered if the order of accessing the files mattered. I didn’t write them to the drive in any special order. The drive is formatted with Linux ext3. I ran stream-corpus.py on all the filenames sorted by filename (remember the SHA-1 naming convention described above) and also by sorting them randomly.

    Result : It helps immensely for the filenames to be sorted. The sorted variant was a little more than twice as fast as the random variant. Maybe it has to do with accessing all the files in a single directory before moving onto another directory.

    Further, I have long been under the impression that the best read speed you can expect from USB 2.0 was 27 Mbytes/sec (even though 480 Mbit/sec is bandied about in relation to the spec). This comes from profiling I performed with an external enclosure that supports both USB 2.0 and FireWire-400 (and eSata). FW-400 was able to read the same file at nearly 40 Mbytes/sec that USB 2.0 could only read at 27 Mbytes/sec. Other sources I have read corroborate this number. But this test (using different hardware), achieved over 31 Mbytes/sec.

  • Learn Multimedia Programming By Writing A JPEG Decoder

    6 janvier 2011, par Multimedia Mike — Programming

    For those of you who hack on multimedia tech, how did you get started ? Did you begin by studying the mathematical underpinnings of multimedia codec algorithms ? Or did you find a practical problem and jump right in by writing code ? (Personally, I was always more of a nuts & bolts hacker than a math guy.) I ask because I occasionally get emails from aspiring multimedia hackers who want to know where to begin. Invariably, they want to go the math-first route. I heavily discourage this approach.

    I have a crazy idea for anyone who wants a crash course on multimedia hacking : write a JPEG decoder. In doing so, you will be exposed to a lot of key domain concepts such as bitstream parsing, Huffman decoding, dequantization, zigzagging, the dreaded (inverse) discrete cosine transform, YUV vs. RGB colorspaces, macroblock organization, delta coding, and run length coding.

    Sure, JPEG decoding is a solved problem. But that’s hardly the point. Why would you enter an unfamiliar field and hope to come up to speed on the basics by leaping straight into the domain’s unsolved problems ? If you are successful in this exercise, no one will ever use the fruits of your labor, but that doesn’t really matter.

    So, do you want to learn multimedia hacking quickly ? Then grab a JPEG file (maybe create a few contrived ones that are small, have friendly dimensions, and feature predictable patterns), grab a good JPEG reference, and implement the decoding algorithm in the language and platform of your choice.

    On the matter of the reference, my personal favorite reference has always been A note about the JPEG decoding algorithm by Cristi Cuturicu. The English grammar is a bit dodgy but overall, it might be the best reference you’ll find on the matter— as simple as it needs to be, but no simpler.

    Good luck !