
Recherche avancée
Autres articles (97)
-
Multilang : améliorer l’interface pour les blocs multilingues
18 février 2011, parMultilang est un plugin supplémentaire qui n’est pas activé par défaut lors de l’initialisation de MediaSPIP.
Après son activation, une préconfiguration est mise en place automatiquement par MediaSPIP init permettant à la nouvelle fonctionnalité d’être automatiquement opérationnelle. Il n’est donc pas obligatoire de passer par une étape de configuration pour cela. -
Gestion des droits de création et d’édition des objets
8 février 2011, parPar défaut, beaucoup de fonctionnalités sont limitées aux administrateurs mais restent configurables indépendamment pour modifier leur statut minimal d’utilisation notamment : la rédaction de contenus sur le site modifiables dans la gestion des templates de formulaires ; l’ajout de notes aux articles ; l’ajout de légendes et d’annotations sur les images ;
-
Dépôt de média et thèmes par FTP
31 mai 2013, parL’outil MédiaSPIP traite aussi les média transférés par la voie FTP. Si vous préférez déposer par cette voie, récupérez les identifiants d’accès vers votre site MédiaSPIP et utilisez votre client FTP favori.
Vous trouverez dès le départ les dossiers suivants dans votre espace FTP : config/ : dossier de configuration du site IMG/ : dossier des média déjà traités et en ligne sur le site local/ : répertoire cache du site web themes/ : les thèmes ou les feuilles de style personnalisées tmp/ : dossier de travail (...)
Sur d’autres sites (10412)
-
Tele-Arena Lives On
25 février 2011, par Multimedia Mike — Game HackingReaders know I have a peculiar interest in taking apart video games and that I would rather study a game’s inner workings than actually play it. I take an interest on others’ efforts in this same area. It’s still in my backlog to take a closer look at Clone2727’s body of work. But I wanted to highlight my friend’s work on re-implementing a game called Tele-Arena.
Back In The Day
As some of you are likely aware, there was a dark age of online communication that predated the era of widespread internet access. This was known as "The BBS Age". People dialed into these BBSes using modems that operated at abysmal transfer speeds and would communicate with other users, upload and download files, and play an occasional game.BBS software evolved and perhaps the ultimate (and final) evolution was Galacticomm’s MajorBBS (MBBS). There were assorted games that plugged into the MBBS, all rendered in glorious color ANSI graphics. One of the most famous of these games was Tele-Arena (TA). TA was a multiplayer fantasy-themed text adventure game. Perhaps you could think of it as World of Warcraft, only rendered as interactive fiction instead of a rich 3D landscape. (Disclaimer : I might not be qualified to make that comparison since I have never experienced WoW firsthand, though I did play TA on and off about 17 years ago).
TA was often compared to multi-user dungeons — or MUDs — that were played by telneting into internet servers hosting games. Such comparisons were usually unfavorable as people who had experience with both TA and MUDs were sniffy elitists with internet access who thought they were sooooo much better than those filthy, BBS-dialing serfs.
Sorry, didn’t mean to open old wounds.
Modern Retelling of A Classic Tale
Anyway, my friend Ron Kinney is perhaps the world’s biggest fan of TA. So much so that he has re-implemented the engine in Java under the project name Ether. He’s in a similar situation as the ScummVM project in that, while the independent, open source engine is fair game for redistribution, it would be questionable to redistribute the original data files. That’s why he created an AreaBuilder application that generates independent game data files.Ironically, you can also telnet into a server on which Ron hosts an instance of Tele-Arena (ironic in the sense that the internet/BBS conflict gets a little blurry).
I hope that one day Ron will regale us with the strangest tales from the classic TA days. My personal favorite was "Wrath of a Sysop."
-
Processing Big Data Problems
8 janvier 2011, par Multimedia Mike — Big DataI’m becoming more interested in big data problems, i.e., extracting useful information out of absurdly sized sets of input data. I know it’s a growing field and there is a lot to read on the subject. But you know how I roll— just think of a problem to solve and dive right in.
Here’s how my adventure unfolded.
The Corpus
I need to run a command line program on a set of files I have collected. This corpus is on the order of 350,000 files. The files range from 7 bytes to 175 MB. Combined, they occupy around 164 GB of storage space.Oh, and said storage space resides on an external, USB 2.0-connected hard drive. Stop laughing.
A file is named according to the SHA-1 hash of its data. The files are organized in a directory hierarchy according to the first 6 hex digits of the SHA-1 hash (e.g., a file named a4d5832f... is stored in a4/d5/83/a4d5832f...). All of this file hash, path, and size information is stored in an SQLite database.
First Pass
I wrote a Python script that read all the filenames from the database, fed them into a pool of worker processes using Python’s multiprocessing module, and wrote some resulting data for each file back to the SQLite database. My Eee PC has a single-core, hyperthreaded Atom which presents 2 CPUs to the system. Thus, 2 worker threads crunched the corpus. It took awhile. It took somewhere on the order of 9 or 10 or maybe even 12 hours. It took long enough that I’m in no hurry to re-run the test and get more precise numbers.At least I extracted my initial set of data from the corpus. Or did I ?
Think About The Future
A few days later, I went back to revisit the data only to notice that the SQLite database was corrupted. To add insult to that bit of injury, the script I had written to process the data was also completely corrupted (overwritten with something unrelated to Python code). BTW, this is was on a RAID brick configured for redundancy. So that’s strike 3 in my personal dealings with RAID technology.I moved the corpus to a different external drive and also verified the files after writing (easy to do since I already had the SHA-1 hashes on record).
The corrupted script was pretty simple to rewrite, even a little better than before. Then I got to re-run it. However, this run was on a faster machine, a hyperthreaded, quad-core beast that exposes 8 CPUs to the system. The reason I wasn’t too concerned about the poor performance with my Eee PC is that I knew I was going to be able to run in on this monster later.
So I let the rewritten script rip. The script gave me little updates regarding its progress. As it did so, I ran some rough calculations and realized that it wasn’t predicted to finish much sooner than it would have if I were running it on the Eee PC.
Limiting Factors
It had been suggested to me that I/O bandwidth of the external USB drive might be a limiting factor. This is when I started to take that idea very seriously.The first idea I had was to move the SQLite database to a different drive. The script records data to the database for every file processed, though it only commits once every 100 UPDATEs, so at least it’s not constantly syncing the disc. I ran before and after tests with a small subset of the corpus and noticed a substantial speedup thanks to this policy chance.
Then I remembered hearing something about "atime" which is access time. Linux filesystems, per default, record the time that a file was last accessed. You can watch this in action by running
'stat <file> ; cat <file> > /dev/null ; stat <file>'
and observe that the "Access" field has been updated to NOW(). This also means that every single file that gets read from the external drive still causes an additional write. To avoid this, I started mounting the external drive with'-o noatime'
which instructs Linux not to record "last accessed" time for files.On the limited subset test, this more than doubled script performance. I then wondered about mounting the external drive as read-only. This had the same performance as noatime. I thought about using both options together but verified that access times are not updated for a read-only filesystem.
A Note On Profiling
Once you start accessing files in Linux, those files start getting cached in RAM. Thus, if you profile, say, reading a gigabyte file from a disk and get 31 MB/sec, and then repeat the same test, you’re likely to see the test complete instantaneously. That’s because the file is already sitting in memory, cached. This is useful in general application use, but not if you’re trying to profile disk performance.Thus, in between runs, do (as root)
'sync; echo 3 > /proc/sys/vm/drop_caches'
in order to wipe caches (explained here).Even Better ?
I re-ran the test using these little improvements. Now it takes somewhere around 5 or 6 hours to run.I contrived an artificially large file on the external drive and did some
'dd'
tests to measure what the drive could really do. The drive consistently measured a bit over 31 MB/sec. If I could read and process the data at 30 MB/sec, the script would be done in about 95 minutes.But it’s probably rather unreasonable to expect that kind of transfer rate for lots of smaller files scattered around a filesystem. However, it can’t be that helpful to have 8 different processes constantly asking the HD for 8 different files at any one time.
So I wrote a script called stream-corpus.py which simply fetched all the filenames from the database and loaded the contents of each in turn, leaving the data to be garbage-collected at Python’s leisure. This test completed in 174 minutes, just shy of 3 hours. I computed an average read speed of around 17 MB/sec.
Single-Reader Script
I began to theorize that if I only have one thread reading, performance should improve greatly. To test this hypothesis without having to do a lot of extra work, I cleared the caches and ran stream-corpus.py until'top'
reported that about half of the real memory had been filled with data. Then I let the main processing script loose on the data. As both scripts were using sorted lists of files, they iterated over the filenames in the same order.Result : The processing script tore through the files that had obviously been cached thanks to stream-corpus.py, degrading drastically once it had caught up to the streaming script.
Thus, I was incented to reorganize the processing script just slightly. Now, there is a reader thread which reads each file and stuffs the name of the file into an IPC queue that one of the worker threads can pick up and process. Note that no file data is exchanged between threads. No need— the operating system is already implicitly holding onto the file data, waiting in case someone asks for it again before something needs that bit of RAM. Technically, this approach accesses each file multiple times. But it makes little practical difference thanks to caching.
Result : About 183 minutes to process the complete corpus (which works out to a little over 16 MB/sec).
Why Multiprocess
Is it even worthwhile to bother multithreading this operation ? Monitoring the whole operation via'top'
, most instances of the processing script are barely using any CPU time. Indeed, it’s likely that only one of the worker threads is doing any work most of the time, pulling a file out of the IPC queue as soon the reader thread triggers its load into cache. Right now, the processing is usually pretty quick. There are cases where the processing (external program) might hang (one of the reasons I’m running this project is to find those cases) ; the multiprocessing architecture at least allows other processes to take over until a hanging process is timed out and killed by its monitoring process.Further, the processing is pretty simple now but is likely to get more intensive in future iterations. Plus, there’s the possibility that I might move everything onto a more appropriately-connected storage medium which should help alleviate the bottleneck bravely battled in this post.
There’s also the theoretical possibility that the reader thread could read too far ahead of the processing threads. Obviously, that’s not too much of an issue in the current setup. But to guard against it, the processes could share a variable that tracks the total number of bytes that have been processed. The reader thread adds filesizes to the count while the processing threads subtract file sizes. The reader thread would delay reading more if the number got above a certain threshold.
Leftovers
I wondered if the order of accessing the files mattered. I didn’t write them to the drive in any special order. The drive is formatted with Linux ext3. I ran stream-corpus.py on all the filenames sorted by filename (remember the SHA-1 naming convention described above) and also by sorting them randomly.Result : It helps immensely for the filenames to be sorted. The sorted variant was a little more than twice as fast as the random variant. Maybe it has to do with accessing all the files in a single directory before moving onto another directory.
Further, I have long been under the impression that the best read speed you can expect from USB 2.0 was 27 Mbytes/sec (even though 480 Mbit/sec is bandied about in relation to the spec). This comes from profiling I performed with an external enclosure that supports both USB 2.0 and FireWire-400 (and eSata). FW-400 was able to read the same file at nearly 40 Mbytes/sec that USB 2.0 could only read at 27 Mbytes/sec. Other sources I have read corroborate this number. But this test (using different hardware), achieved over 31 Mbytes/sec.
-
I Really Like My New EeePC
29 août 2010, par Multimedia Mike — GeneralFair warning : I’m just going to use this post to blather disconnectedly about a new-ish toy.
I really like my new EeePC. I was rather enamored with the original EeePC 701 from late 2007, a little box with a tiny 7″ screen that is credited with kicking off the netbook revolution. Since then, Asus has created about a hundred new EeePC models.
Since I’m spending so much time on a train these days, I finally took the plunge to get a better netbook. I decided to stay loyal to Asus and their Eee lineage and got the highest end EeePC they presently offer (which was still under US$500)– the EeePC 1201PN. The ’12′ in the model number represents a 12″ screen size and the rest of the specs are commensurately as large. Indeed, it sort of blurs the line between netbook and full-blown laptop.
Incidentally, after I placed the order for the 1201PN nearly 2 months ago, and I mean the very literal next moment, this Engadget headline came across announcing the EeePC 1215N. My new high-end (such as it is) computer purchase was immediately obsoleted ; I thought that only happened in parody. (As of this writing, the 1215N still doesn’t appear to be shipping, though.)
It’s a sore point among Linux aficionados that Linux was used to help kickstart the netbook trend but that now it’s pretty much impossible to find Linux pre-installed on a netbook. So it is in this case. This 1201PN comes with Windows 7 Home Premium installed. This is a notable differentiator from most netbooks which only have Windows 7 Home Starter, a.k.a., the Windows 7 version so crippled that it doesn’t even allow the user to change the background image.
I wished to preserve the Windows 7 installation (you never know when it will come in handy) and dual boot Linux. I thought I would have to use the Windows partition tool to divide work some magic. Fortunately, the default installation already carved the 250 GB HD in half ; I was able to reformat the second partition and install Linux. The details are a little blurry, but I’m pretty sure one of those external USB optical drives shown in my last post actually performed successfully for this task. Lucky break.
The EeePC 1201PN, EeePC 701, Belco Alpha-400, and even a comparatively gargantuan Sony Vaio full laptop– all of the portable computers in the household
So I got Ubuntu 10.04 Linux installed in short order. This feels like something of a homecoming for me. You see, I used Linux full-time at home from 1999-2006. In 2007, I switched to using Windows XP full-time, mostly because my home use-case switched to playing a lot of old, bad computer games. By the end of 2008, I had transitioned to using the Mac Mini that I had originally purchased earlier that year for running FATE cycles. That Mac served as my main home computer until I purchased the 1201PN 2 months ago.
Mostly, I have this overriding desire for computers to just work, at least in their basic functions. And that’s why I’m so roundly impressed with the way Linux handles right out of the box. Nearly everything on the 1201PN works in Linux. The video, the audio, the wireless networking, the webcam, it all works out of the box. I had to do the extra installation step to get the binary nVidia drivers installed but even that’s relatively seamless, especially compared to “the way things used to be” (drop to a prompt, run some binary installer from the prompt as root, watch it fail in arcane ways because the thing is only certified to run on one version of one Linux distribution). The 1201PN, with its nVidia Ion2 graphics, is able to drive both its own 1366×768 screen simultaneously with an external monitor running at up on 2560×1600.
The only weird hiccup in the whole process was that I had a little trouble with the special volume keys on the keyboard (specifically, the volume up/down/mute keys didn’t do anything). But I quickly learned that I had to install some package related to ACPI and they magically started to do the right thing. Now I get to encounter the Linux Flash Player bug where modifying volume via those special keys forces fullscreen mode to exit. Adobe really should fix that.
Also, trackpad multitouch gestures don’t work right away. Based on my reading, it is possible to set those up in Linux. But it’s largely a preference thing– I don’t care much for multitouch. This creates a disparity when I use Windows 7 on the 1201PN which is configured per default to use multitouch.
The same 4 laptops stacked up
So, in short, I’m really happy with this little machine. Traditionally, I have had absolutely no affinity for laptops/notebooks/portable computers at all even if everyone around was always completely enamored with the devices. What changed for me ? Well for starters, as a long-time Linux user, I was used to having to invest in very specific, carefully-researched hardware lest I not be able to use it under the Linux OS. This was always a major problem in the laptop field which typically reign supreme in custom, proprietary hardware components. These days, not so much, and these netbooks seem to contain well-supported hardware. Then there’s the fact that laptops always cost so much more than similarly capable desktop systems and that I had no real reason for taking a computer with me when I left home. So my use case changed, as did the price point for relatively low-power laptops/netbooks.
Data I/O geek note : The 1201PN is capable of wireless-N networking — as many netbooks seem to have — but only 100 Mbit ethernet. I wondered why it didn’t have gigabit ethernet. Then I remembered that 100 Mbit ethernet provides 11-11.5 Mbytes/sec of transfer speed which, in my empirical experience, is approximately the maximum write speed of a 5400 RPM hard drive– which is what the 1201PN possesses.