
Recherche avancée
Médias (5)
-
ED-ME-5 1-DVD
11 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Audio
-
Revolution of Open-source and film making towards open film making
6 octobre 2011, par
Mis à jour : Juillet 2013
Langue : English
Type : Texte
-
Valkaama DVD Cover Outside
4 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Image
-
Valkaama DVD Label
4 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Image
-
Valkaama DVD Cover Inside
4 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Image
Autres articles (98)
-
MediaSPIP 0.1 Beta version
25 avril 2011, parMediaSPIP 0.1 beta is the first version of MediaSPIP proclaimed as "usable".
The zip file provided here only contains the sources of MediaSPIP in its standalone version.
To get a working installation, you must manually install all-software dependencies on the server.
If you want to use this archive for an installation in "farm mode", you will also need to proceed to other manual (...) -
Multilang : améliorer l’interface pour les blocs multilingues
18 février 2011, parMultilang est un plugin supplémentaire qui n’est pas activé par défaut lors de l’initialisation de MediaSPIP.
Après son activation, une préconfiguration est mise en place automatiquement par MediaSPIP init permettant à la nouvelle fonctionnalité d’être automatiquement opérationnelle. Il n’est donc pas obligatoire de passer par une étape de configuration pour cela. -
ANNEXE : Les plugins utilisés spécifiquement pour la ferme
5 mars 2010, parLe site central/maître de la ferme a besoin d’utiliser plusieurs plugins supplémentaires vis à vis des canaux pour son bon fonctionnement. le plugin Gestion de la mutualisation ; le plugin inscription3 pour gérer les inscriptions et les demandes de création d’instance de mutualisation dès l’inscription des utilisateurs ; le plugin verifier qui fournit une API de vérification des champs (utilisé par inscription3) ; le plugin champs extras v2 nécessité par inscription3 (...)
Sur d’autres sites (9286)
-
Join us at MatomoCamp 2024 world tour edition
13 novembre 2024, par Daniel Crough — Uncategorized -
Dreamcast Anniversary Programming
10 septembre 2010, par Multimedia Mike — Game HackingThis day last year saw a lot of nostalgia posts on the internet regarding the Sega Dreamcast, launched 10 years prior to that day (on 9/9/99). Regrettably, none of the retrospectives that I read really seemed to mention the homebrew potential, which is the aspect that interested me. On the occasion of the DC’s 11th anniversary, I wanted to remind myself how to build something for the unit and do so using modern equipment and build tools.
Background
Like many other programmers, I initially gained interest in programming because I desired to program video games. Not content to just plunk out games on a PC, I always had a deep, abiding ambition to program actual video game hardware. That is, I wanted to program a purpose-built video game console. The Sega Dreamcast might be the most ideal candidate to ever emerge for that task. All that was required to run your own software on the unit was the console, a PC, some free software tools, and a special connectivity measure.The Equipment
Here is the hardware required (ideally) to build software for the DC :- The console itself (I happen to have 3 of them laying around, as pictured above)
- Some peripherals : Such as the basic DC controller, the DC keyboard (flagship title : Typing of the Dead), and the visual memory unit (VMU)
- VGA box : The DC supported 480p gaming via a device that allowed you to connect the console straight to a VGA monitor via 15-pin D-sub. Not required for development, but very useful. I happen to have 3 of them from different third parties :
- Finally, the connectivity measure for hooking the DC to the PC.
There are 2 options here. The first is rare, expensive and relatively fast : A DC broadband adapter. The second is slower but much less expensive and relatively easy to come by– the DC coder’s cable. This was a DB-9 adapter on one end and a DC serial adapter on the other, and a circuit in the middle to monkey with voltage levels or some such ; I’m no electrical engineer. I procured this model from the notorious Lik Sang, well before that outfit was sued out of business.
Dealing With Legacy
Take a look at that coder’s cable again. DB-9 ? When was the last time you owned a computer with one of those ? And then think farther back to the last time to had occasion to plug something into one of those ports (likely a serial mouse).
A few years ago, someone was about to toss out this Belkin USB to DB-9 serial converter when I intervened. I foresaw the day when I would dust off the coder’s cable. So now I can connect a USB serial cable to my Eee PC, which then connects via converter to a different serial cable, one which has its own conversion circuit that alters the connection to yet another type of serial cable.
Bits is bits is bits as far as I’m concerned.
Putting It All Together
Now to assemble all the pieces (plus a monitor) into one development desktop :
The monitor says “dcload 1.0.3, idle…”. That’s a custom boot CD-ROM that is patiently waiting to receive commands, code and data via the serial port.
Getting The Software
Back in the day, homebrew software development on the DC revolved around these components :- GNU binutils : for building base toolchains for the Hitachi SH-4 main CPU as well as the ARM7-based audio coprocessor
- GNU gcc/g++ : for building compilers on top of binutils for the 2 CPUs
- Newlib : a C library intended for embedded systems
- KallistiOS : an open source, real-time OS developed for the DC
The DC was my first exposure to building cross compilers. I developed some software for the DC in the earlier part of the decade. Now, I am trying to figure out how I did it, especially since I think I came up with a few interesting ideas at the time.
Struggling With the Software Legacy
The source for KallistiOS has gone untouched since about 2004 but is still around thanks to Sourceforge. The instructions for properly building the toolchain have been lost to time, or would be were it not for the Internet Archive’s copy of a site called Hangar Eleven. Also, KallistiOS makes reference to a program called ‘dc-tool’ which is needed on the client side for communicating with dcload. I was able to find this binary at the Boob ! site (well-known in DC circles).I was able to build the toolchain using binutils 2.20.1, gcc 4.5.1 and newlib 1.18.0. Building the toolchain is an odd process as it requires building the binutils, then building the C compiler, then newlib, and then building the C compiler again along with the C++ compiler because the C++ compiler depends on newlib.
With some effort, I got the toolchain to build KallistiOS and most of its example programs. I documented most of the tweaks I had to make, several of them exactly the same as this one that I recently discovered while resurrecting a 10-year-old C program (common construct in C programming of old ?).
Moment of Truth
So I had some example programs built as ELF files. I told dc-tool to upload and run them on the waiting console. Unfortunately, the tool would just sort of stall, though some communication had evidently taken place. It has been many years since I have seen this in action but I recall that something more ought to be happening.Plan B (Hardware)
This is the point that I remember that I have been holding onto one rather old little machine that still has a DB-9 serial port. It’s not especially ergonomic to set up. I have to run it on my floor because, to connect it to my network, I need to run a 25′ ethernet cable that just barely reaches from the other room. The machine doesn’t seem to like USB keyboards, which is a shame since I have long since ditched any PS/2 keyboards. Fortunately, the box still has an old Gentoo distro and is running sshd, a holdover from its former life as a headless box.
Now when I run dc-tool, both the PC and DC report the upload progress while pretty overscan bars oscillate on the DC’s monitor. Now I’m back in business, until…
Plan C (Software)
None of these KallistiOS example programs are working. Some are even reporting catastrophic failures (register dumps) via the serial console. That’s when I remember that gcc can be a bit fickle on CPU architectures that are not, shall we say, first-class citizens. Back in the day, gcc 2.95 was a certified no-go for SH-4 development. 3.0.3 or 3.0.4 was called upon at the time. As I’m hosting this toolchain on x86_64 right now, gcc 3.0.4 can’t even be built (predates the architecture).One last option : As I searched through my old DC project directories, I found that I still have a lot of the resulting binaries, the ones I built 7-8 years ago. I upload a few of those and I finally see homebrew programming at work again, including this old program (described in detail here).
Next Steps
If I ever feel like revisiting this again, I suppose I can try some of the older 4.x series to see if they build valid programs. Alternatively, try building an x86_32-hosted 3.0.4 toolchain which ought to be a known good. And if that fails, search a little bit more to find that there are still active Dreamcast communities out there on the internet which probably have development toolchain binaries ready for download. -
Processing Big Data Problems
8 janvier 2011, par Multimedia Mike — Big DataI’m becoming more interested in big data problems, i.e., extracting useful information out of absurdly sized sets of input data. I know it’s a growing field and there is a lot to read on the subject. But you know how I roll— just think of a problem to solve and dive right in.
Here’s how my adventure unfolded.
The Corpus
I need to run a command line program on a set of files I have collected. This corpus is on the order of 350,000 files. The files range from 7 bytes to 175 MB. Combined, they occupy around 164 GB of storage space.Oh, and said storage space resides on an external, USB 2.0-connected hard drive. Stop laughing.
A file is named according to the SHA-1 hash of its data. The files are organized in a directory hierarchy according to the first 6 hex digits of the SHA-1 hash (e.g., a file named a4d5832f... is stored in a4/d5/83/a4d5832f...). All of this file hash, path, and size information is stored in an SQLite database.
First Pass
I wrote a Python script that read all the filenames from the database, fed them into a pool of worker processes using Python’s multiprocessing module, and wrote some resulting data for each file back to the SQLite database. My Eee PC has a single-core, hyperthreaded Atom which presents 2 CPUs to the system. Thus, 2 worker threads crunched the corpus. It took awhile. It took somewhere on the order of 9 or 10 or maybe even 12 hours. It took long enough that I’m in no hurry to re-run the test and get more precise numbers.At least I extracted my initial set of data from the corpus. Or did I ?
Think About The Future
A few days later, I went back to revisit the data only to notice that the SQLite database was corrupted. To add insult to that bit of injury, the script I had written to process the data was also completely corrupted (overwritten with something unrelated to Python code). BTW, this is was on a RAID brick configured for redundancy. So that’s strike 3 in my personal dealings with RAID technology.I moved the corpus to a different external drive and also verified the files after writing (easy to do since I already had the SHA-1 hashes on record).
The corrupted script was pretty simple to rewrite, even a little better than before. Then I got to re-run it. However, this run was on a faster machine, a hyperthreaded, quad-core beast that exposes 8 CPUs to the system. The reason I wasn’t too concerned about the poor performance with my Eee PC is that I knew I was going to be able to run in on this monster later.
So I let the rewritten script rip. The script gave me little updates regarding its progress. As it did so, I ran some rough calculations and realized that it wasn’t predicted to finish much sooner than it would have if I were running it on the Eee PC.
Limiting Factors
It had been suggested to me that I/O bandwidth of the external USB drive might be a limiting factor. This is when I started to take that idea very seriously.The first idea I had was to move the SQLite database to a different drive. The script records data to the database for every file processed, though it only commits once every 100 UPDATEs, so at least it’s not constantly syncing the disc. I ran before and after tests with a small subset of the corpus and noticed a substantial speedup thanks to this policy chance.
Then I remembered hearing something about "atime" which is access time. Linux filesystems, per default, record the time that a file was last accessed. You can watch this in action by running
'stat <file> ; cat <file> > /dev/null ; stat <file>'
and observe that the "Access" field has been updated to NOW(). This also means that every single file that gets read from the external drive still causes an additional write. To avoid this, I started mounting the external drive with'-o noatime'
which instructs Linux not to record "last accessed" time for files.On the limited subset test, this more than doubled script performance. I then wondered about mounting the external drive as read-only. This had the same performance as noatime. I thought about using both options together but verified that access times are not updated for a read-only filesystem.
A Note On Profiling
Once you start accessing files in Linux, those files start getting cached in RAM. Thus, if you profile, say, reading a gigabyte file from a disk and get 31 MB/sec, and then repeat the same test, you’re likely to see the test complete instantaneously. That’s because the file is already sitting in memory, cached. This is useful in general application use, but not if you’re trying to profile disk performance.Thus, in between runs, do (as root)
'sync; echo 3 > /proc/sys/vm/drop_caches'
in order to wipe caches (explained here).Even Better ?
I re-ran the test using these little improvements. Now it takes somewhere around 5 or 6 hours to run.I contrived an artificially large file on the external drive and did some
'dd'
tests to measure what the drive could really do. The drive consistently measured a bit over 31 MB/sec. If I could read and process the data at 30 MB/sec, the script would be done in about 95 minutes.But it’s probably rather unreasonable to expect that kind of transfer rate for lots of smaller files scattered around a filesystem. However, it can’t be that helpful to have 8 different processes constantly asking the HD for 8 different files at any one time.
So I wrote a script called stream-corpus.py which simply fetched all the filenames from the database and loaded the contents of each in turn, leaving the data to be garbage-collected at Python’s leisure. This test completed in 174 minutes, just shy of 3 hours. I computed an average read speed of around 17 MB/sec.
Single-Reader Script
I began to theorize that if I only have one thread reading, performance should improve greatly. To test this hypothesis without having to do a lot of extra work, I cleared the caches and ran stream-corpus.py until'top'
reported that about half of the real memory had been filled with data. Then I let the main processing script loose on the data. As both scripts were using sorted lists of files, they iterated over the filenames in the same order.Result : The processing script tore through the files that had obviously been cached thanks to stream-corpus.py, degrading drastically once it had caught up to the streaming script.
Thus, I was incented to reorganize the processing script just slightly. Now, there is a reader thread which reads each file and stuffs the name of the file into an IPC queue that one of the worker threads can pick up and process. Note that no file data is exchanged between threads. No need— the operating system is already implicitly holding onto the file data, waiting in case someone asks for it again before something needs that bit of RAM. Technically, this approach accesses each file multiple times. But it makes little practical difference thanks to caching.
Result : About 183 minutes to process the complete corpus (which works out to a little over 16 MB/sec).
Why Multiprocess
Is it even worthwhile to bother multithreading this operation ? Monitoring the whole operation via'top'
, most instances of the processing script are barely using any CPU time. Indeed, it’s likely that only one of the worker threads is doing any work most of the time, pulling a file out of the IPC queue as soon the reader thread triggers its load into cache. Right now, the processing is usually pretty quick. There are cases where the processing (external program) might hang (one of the reasons I’m running this project is to find those cases) ; the multiprocessing architecture at least allows other processes to take over until a hanging process is timed out and killed by its monitoring process.Further, the processing is pretty simple now but is likely to get more intensive in future iterations. Plus, there’s the possibility that I might move everything onto a more appropriately-connected storage medium which should help alleviate the bottleneck bravely battled in this post.
There’s also the theoretical possibility that the reader thread could read too far ahead of the processing threads. Obviously, that’s not too much of an issue in the current setup. But to guard against it, the processes could share a variable that tracks the total number of bytes that have been processed. The reader thread adds filesizes to the count while the processing threads subtract file sizes. The reader thread would delay reading more if the number got above a certain threshold.
Leftovers
I wondered if the order of accessing the files mattered. I didn’t write them to the drive in any special order. The drive is formatted with Linux ext3. I ran stream-corpus.py on all the filenames sorted by filename (remember the SHA-1 naming convention described above) and also by sorting them randomly.Result : It helps immensely for the filenames to be sorted. The sorted variant was a little more than twice as fast as the random variant. Maybe it has to do with accessing all the files in a single directory before moving onto another directory.
Further, I have long been under the impression that the best read speed you can expect from USB 2.0 was 27 Mbytes/sec (even though 480 Mbit/sec is bandied about in relation to the spec). This comes from profiling I performed with an external enclosure that supports both USB 2.0 and FireWire-400 (and eSata). FW-400 was able to read the same file at nearly 40 Mbytes/sec that USB 2.0 could only read at 27 Mbytes/sec. Other sources I have read corroborate this number. But this test (using different hardware), achieved over 31 Mbytes/sec.