Recherche avancée

Médias (33)

Mot : - Tags -/creative commons

Autres articles (76)

  • La file d’attente de SPIPmotion

    28 novembre 2010, par

    Une file d’attente stockée dans la base de donnée
    Lors de son installation, SPIPmotion crée une nouvelle table dans la base de donnée intitulée spip_spipmotion_attentes.
    Cette nouvelle table est constituée des champs suivants : id_spipmotion_attente, l’identifiant numérique unique de la tâche à traiter ; id_document, l’identifiant numérique du document original à encoder ; id_objet l’identifiant unique de l’objet auquel le document encodé devra être attaché automatiquement ; objet, le type d’objet auquel (...)

  • Les vidéos

    21 avril 2011, par

    Comme les documents de type "audio", Mediaspip affiche dans la mesure du possible les vidéos grâce à la balise html5 .
    Un des inconvénients de cette balise est qu’elle n’est pas reconnue correctement par certains navigateurs (Internet Explorer pour ne pas le nommer) et que chaque navigateur ne gère en natif que certains formats de vidéos.
    Son avantage principal quant à lui est de bénéficier de la prise en charge native de vidéos dans les navigateur et donc de se passer de l’utilisation de Flash et (...)

  • Le profil des utilisateurs

    12 avril 2011, par

    Chaque utilisateur dispose d’une page de profil lui permettant de modifier ses informations personnelle. Dans le menu de haut de page par défaut, un élément de menu est automatiquement créé à l’initialisation de MediaSPIP, visible uniquement si le visiteur est identifié sur le site.
    L’utilisateur a accès à la modification de profil depuis sa page auteur, un lien dans la navigation "Modifier votre profil" est (...)

Sur d’autres sites (7360)

  • Adjusting The Timetable and SQL Shame

    16 août 2012, par Multimedia Mike — General, Python, sql

    My Game Music Appreciation website has a big problem that many visitors quickly notice and comment upon. The problem looks like this :



    The problem is that all of these songs are 2m30s in length. During the initial import process, unless a chiptune file already had curated length metadata attached, my metadata utility emitted a default play length of 150 seconds. This is not good if you want to listen to all the songs in a soundtrack without interacting with the player page, but have various short songs (think “game over” or other quick jingles) that are over in a few seconds. Such songs still pad out 150 seconds of silence.

    So I needed to correct this. Possible solutions :

    1. Manually : At first, I figured I could ask the database which songs needed fixing and listen to them to determine the proper lengths. Then I realized that there were well over 1400 games affected by this problem. This just screams “automated solution”.
    2. Automatically : Ask the database which songs need fixing and then somehow ask the computer to listen to the songs and decide their proper lengths. This sounds like a winner, provided that I can figure out how to programmatically determine if a song has “finished”.

    SQL Shame
    This play adjustment task has been on my plate for a long time. A key factor that has blocked me is that I couldn’t figure out a single SQL query to feed to the SQLite database underlying the site which would give me all the songs I needed. To be clear, it was very simple and obvious to me how to write a program that would query the database in phases to get all the information. However, I felt that it would be impure to proceed with the task unless I could figure out one giant query to get all the information.

    This always seems to come up whenever I start interacting with a database in any serious way. I call it SQL shame. This task got some traction when I got over this nagging doubt and told myself that there’s nothing wrong with the multi-step query program if it solves the problem at hand.

    Suddenly, I had a flash of inspiration about why the so-called NoSQL movement exists. Maybe there are a lot more people who don’t like trying to derive such long queries and are happy to allow other languages to pick up the slack.

    Estimating Lengths
    Anyway, my solution involved writing a Python script to iterate through all the games whose metadata was output by a certain engine (the one that makes the default play length 150 seconds). For each of those games, the script queries the song table and determines if each song is exactly 150 seconds. If it is, then go to work trying to estimate the true length.

    The forgoing paragraph describes what I figured was possible with only a single (possibly large) SQL query.

    For each song represented in the chiptune file, I ran it through a custom length estimator program. My brilliant (err, naïve) solution to the length estimation problem was to synthesize seconds of audio up to a maximum of 120 seconds (tightening up the default length just a bit) and counting how many of those seconds had all 0 samples. If the count reached 5 consecutive seconds of silence, then the estimator rewound the running length by 5 seconds and declared that to be the proper length. Update the database.

    There were about 1430 chiptune files whose songs needed updates. Some files had 1 single song. Some files had over 100. When I let the script run, it took nearly 65 minutes to process all the files. That was a single-threaded solution, of course. Even though I already had the data I needed, I wanted to try to hand at parallelizing the script. So I went to work with Python’s multiprocessing module and quickly refactored it to use all 4 CPU threads on the machine where the files live. Results :

    • Single-threaded solution : 64m42s to process corpus (22 games/minute)
    • Multi-threaded solution : 18m48s with 4 CPU threads (75 games/minute)

    More than a 3x speedup across 4 CPU threads, which is decent for a primarily CPU-bound operation.

    Epilogue
    I suspect that this task will require some refinement or manual intervention. Maybe there are songs which actually have more than 5 legitimate seconds of silence. Also, I entertained the possibility that some songs would generate very low amplitude noise rather than being perfectly silent. In that case, I could refine the script to stipulate that amplitudes below a certain threshold count as 0. Fortunately, I marked which games were modified by this method, so I can run a new script as necessary.

    SQL Schema
    Here is the schema of my SQlite3 database, for those who want to try their hand at a proper query. I am confident that it’s possible ; I just didn’t have the patience to work it out. The task is to retrieve all the rows from the games table where all of the corresponding songs in the songs table is 150000 milliseconds.

    1. CREATE TABLE games
    2.   (
    3.    id INTEGER PRIMARY KEY AUTOINCREMENT,
    4.    uncompressed_sha1 TEXT,
    5.    uncompressed_size INTEGER,
    6.    compressed_sha1 TEXT,
    7.    compressed_size INTEGER,
    8.    system TEXT,
    9.    game TEXT,
    10.    gme_system TEXT default NULL,
    11.    canonical_url TEXT default NULL,
    12.    extension TEXT default "gamemusicxz",
    13.    enabled INTEGER default 1,
    14.    redirect_to_id INT DEFAULT -1,
    15.    play_lengths_modified INT DEFAULT NULL) ;
    16. CREATE TABLE songs
    17.   (
    18.    game_id INTEGER,
    19.    song_number INTEGER NOT NULL,
    20.    song TEXT,
    21.    author TEXT,
    22.    copyright TEXT,
    23.    dumper TEXT,
    24.    length INTEGER,
    25.    intro_length INTEGER,
    26.    loop_length INTEGER,
    27.    play_length INTEGER,
    28.    play_order INTEGER default -1) ;
    29. CREATE TABLE tags
    30.   (
    31.    game_id INTEGER,
    32.    tag TEXT NOT NULL,
    33.    tag_type TEXT default "filename") ;
    34. CREATE INDEX gameid_index_songs ON songs(game_id) ;
    35. CREATE INDEX gameid_index_tag ON tags(game_id) ;
    36. CREATE UNIQUE INDEX sha1_index ON games(uncompressed_sha1) ;
  • Re-solving My Search Engine Problem

    28 juillet 2012, par Multimedia Mike — General, swish-e

    14 years ago, I created a web database of 8-bit Nintendo Entertainment System games. To make it useful, I developed a very primitive search feature.

    A few months ago, I decided to create a web database of video game music. To make it useful, I knew it would need to have a search feature. I realized I needed to solve the exact same problem again.

    Requirements
    The last time I solved this problem, I came up with an excruciatingly naïve idea. Hey, it worked. I really didn’t want to deploy the same solution again because it felt so silly the first time. Surely there are many better ways to solve it now ? Many different workable software solutions that do all the hard work for me ?

    The first time I attacked this, it was 1998 and hosting resources were scarce. On my primary web host I was able to put static HTML pages, perhaps with server side includes. The web host also offered dynamic scripting capabilities via something called htmlscript (a.k.a. MIVA Script). I had a secondary web host in my ISP which allowed me to host conventional CGI scripts on a Unix host, so that’s where I hosted the search function (Perl CGI script accessing a key/value data store file).

    Nowadays, sky’s the limit. Any type of technology you want to deploy should be tractable. Still, a key requirement was that I didn’t want to pay for additional hosting resources for this silly little side project. That leaves me with options that my current shared web hosting plan allows, which includes such advanced features as PHP, Perl and Python scripts. I can also access MySQL.

    Candidates
    There are a lot of mature software packages out there which can index and search data and be plugged into a website. But a lot of them would be unworkable on my web hosting plan due to language or library package limitations. Further, a lot of them feel like overkill. At the most basic level, all I really want to do is map a series of video game titles to URLs in a website.

    Based on my research, Lucene seems to hold a fair amount of mindshare as an open source indexing and search solution. But I was unsure of my ability to run it on my hosting plan. I think MySQL does some kind of full text search, so I could have probably made a solution around that. Again, it just feels like way more power than I need for this project.

    I used Swish-e once about 3 years ago for a little project. I wasn’t confident of my ability to run that on my server either. It has a Perl API but it requires custom modules.

    My quest for a search solution grew deep enough that I started perusing a textbook on information retrieval techniques in preparation for possibly writing my own solution from scratch. However, in doing so, I figured out how I might subvert an existing solution to do what I want.

    Back to Swish-e
    Again, all I wanted to do was pull data out of a database and map that data to a URL in a website. Reading the Swish-e documentation, I learned that the software supports a mode specifically tailored for this. Rather than asking Swish-e to index a series of document files living on disk, you can specify a script for Swish-e to run and the script will generate what appears to be a set of phantom documents for Swish-e to index.

    When I ’add’ a game music file to the game music website, I have a scripts that scrape the metadata (game title, system, song titles, composers, company, copyright, the original file name on disk, even the ripper/dumper who extracted the chiptune in the first place) and store it all in an SQLite database. When it’s time to update the database, another script systematically generates a series of pseudo-documents that spell out the metadata for each game and prefix each document with a path name. Searching for a term in the index returns a lists of paths that contain the search term. Thus, it makes sense for that path to be a site URL.

    But what about a web script which can search this Swish-e index ? That’s when I noticed Swish-e’s C API and came up with a crazy idea : Write the CGI script directly in C. It feels like sheer madness (or at least the height of software insecurity) to write a CGI script directly in C in this day and age. But it works (with the help of cgic for input processing), just as long as I statically link the search script with libswish-e.a (and libz.a). The web host is an x86 machine, after all.

    I’m not proud of what I did here— I’m proud of how little I had to do here. The searching CGI script is all of about 30 lines of C code. The one annoyance I experienced while writing it is that I had to consult the Swish-e source code to learn how to get my search results (the "swishdocpath" key — or any other key — for SwishResultPropertyStr() is not documented). Also, the C program just does the simplest job possible, only querying the term in the index and returning the results in plaintext, in order of relevance, to the client-side JavaScript code which requested them. JavaScript gets the job of sorting and grouping the results for presentation.

    Tuning the Search
    Almost immediately, I noticed that the search engine could not find one of my favorite SNES games, U.N. Squadron. That’s because all of its associated metadata names Area 88, the game’s original title. Thus, I had to modify the metadata database to allow attaching somewhat free-form tags to games in order to compensate. In this case, an alias title would show up in the game’s pseudo-document.

    Roman numerals are still a thorn in my side, just as they were 14 years ago in my original iteration. I dealt with it back then by converting all numbers to Roman numerals during the index and searching processes. I’m not willing to do that for this case and I’m still looking for a good solution.

    Another annoying problem deals with Mega Man, a popular franchise. The proper spelling is 2 words but it’s common for people to mash it into one word, Megaman (see also : Spider-Man, Spiderman, Spider Man). The index doesn’t gracefully deal with that and I have some hacks in place to cope for the time being.

    Positive Results
    I’m pleased with the results so far, and so are the users I have heard from. I know one user expressed amazement that a search for Castlevania turned up Akumajou Densetsu, the Japanese version of Castlevania III : Dracula’s Curse. This didn’t surprise me because I manually added a hint for that mapping. (BTW, if you are a fan of Castlevania III, definitely check out the Akumajou Densetsu soundtrack which has an upgraded version of the same soundtrack using special audio channels.)

    I was a little more surprised when a user announced that searching for ’probotector’ correctly turned up Contra : Hard Corps. I looked into why this was. It turns out that the original chiptune filename was extremely descriptive : "Contra - Hard Corps [Probotector] (1994-08-08)(Konami)". The filenames themselves often carry a bunch of useful metadata which is why it’s important to index those as well.

    And of course, many rippers, dumpers, and taggers have labored for over a decade to lovingly tag these songs with as much composer information as possible, which all gets indexed. The search engine gets a lot of compliments for its ability to find many songs written by favorite composers.

  • Dreamcast Operating Systems

    16 septembre 2010, par Multimedia Mike — Sega Dreamcast

    The Sega Dreamcast was famously emblazoned with a logo proudly announcing that it was compatible with Windows CE :



    It’s quite confusing. The console certainly doesn’t boot into some version of Windows to launch games. Apparently, there was a special version of CE developed for the DC and game companies had the option to leverage it. I do recall that some game startup screens would similarly advertise Windows CE.

    Once the homebrew community got ahold of the device, the sky was the limit. I think NetBSD was the first alternative OS to support the Dreamcast. Meanwhile, I have recollections of DC Linux and LinuxDC projects along with more generic Linux-SH and SH-Linux projects.



    DC Evolution hosts a disc image available for download with an unofficial version of DC Linux, assembled by one Adrian O’Grady. I figured out how to burn the disc (burning DC discs is a blog post of its own) and got it working in the console.

    It’s possible to log in directly via the physical keyboard or through a serial terminal provided that you have a coder’s cable. That reminds me– my local Fry’s had a selection of USB-to-serial cables. I think this is another area that is sufficiently commoditized that just about any cable ought to work with Linux out of the box. Or maybe I’m just extrapolating from the experience of having the cheapest cable in the selection (made by io connect) plug and play with Linux.



    Look ! No messy converter box in the middle as in the Belkin case. The reason I went with this cable is that the packaging claimed it was capable of up to 500 Kbits/sec. Most of the cables advertised a max of 115200 bps. I distinctly recall being able to use the DC coder’s cable at 230400 bps a long time ago. Alas, 115200 seems to be the speed limit, even with this new USB cable.

    Anyway, the distribution is based on a 2.4.5 kernel circa 2001. I tried to make PPP work over the serial cable but the kernel doesn’t have support. If you’re interested, here is some basic information about the machine from Linux’s perspective, gleaned from some simple commands. This helps remind us of a simpler time when Linux was able to run comfortably on a computer with 16 MB of RAM.

    Debian GNU/Linux testing/unstable dreamcast ttsc/1
    

    dreamcast login : root
    Linux dreamcast 2.4.5 #27 Thu May 31 07:06:51 JST 2001 sh4 unknown

    Most of the programs included with the Debian GNU/Linux system are
    freely redistributable ; the exact distribution terms for each program
    are described in the individual files in /usr/share/doc/*/copyright

    Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
    permitted by applicable law.

    dreamcast : # uname -a
    Linux dreamcast 2.4.5 #27 Thu May 31 07:06:51 JST 2001 sh4 unknown

    dreamcast : # cat /proc/cpuinfo
    cpu family : SH-4
    cache size : 8K-byte/16K-byte
    bogomips : 199.47

    Machine : dreamcast
    CPU clock : 200.00MHz
    Bus clock : 100.00MHz
    Peripheral module clock : 50.00MHz

    dreamcast : # top -b

    09:14:54 up 14 min, 1 user, load average : 0.04, 0.03, 0.03
    15 processes : 14 sleeping, 1 running, 0 zombie, 0 stopped
    CPU states : 1.1% user, 5.8% system, 0.0% nice, 93.1% idle
    Mem : 14616K total, 11316K used, 3300K free, 2296K buffers
    Swap : 0K total, 0K used, 0K free, 5556K cached

    PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
    219 root 18 0 1072 1068 868 R 5.6 7.3 0:00 top
    1 root 9 0 596 596 512 S 0.0 4.0 0:01 init
    2 root 9 0 0 0 0 SW 0.0 0.0 0:00 keventd
    3 root 9 0 0 0 0 SW 0.0 0.0 0:00 kswapd
    4 root 9 0 0 0 0 SW 0.0 0.0 0:00 kreclaimd
    5 root 9 0 0 0 0 SW 0.0 0.0 0:00 bdflush
    6 root 9 0 0 0 0 SW 0.0 0.0 0:00 kupdated
    7 root 9 0 0 0 0 SW 0.0 0.0 0:00 kmapled
    39 root 9 0 900 900 668 S 0.0 6.1 0:00 devfsd
    91 root 8 0 652 652 556 S 0.0 4.4 0:00 pump
    96 daemon 9 0 524 524 420 S 0.0 3.5 0:00 portmap
    149 root 9 0 944 944 796 S 0.0 6.4 0:00 syslogd
    152 root 9 0 604 604 456 S 0.0 4.1 0:00 klogd
    187 root 9 0 540 540 456 S 0.0 3.6 0:00 getty
    201 root 9 0 1380 1376 1112 S 0.0 9.4 0:01 bash

    Note that at this point I had shutdown both gpm and inetd. The rest of the processes, save for bash, are default. The above stats only report about 14 MB of RAM ; where are the other 2 MB ?

    dreamcast : # df -h
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/rd/1             2.0M  560k  1.4M  28% /