
Recherche avancée
Autres articles (67)
-
La file d’attente de SPIPmotion
28 novembre 2010, parUne file d’attente stockée dans la base de donnée
Lors de son installation, SPIPmotion crée une nouvelle table dans la base de donnée intitulée spip_spipmotion_attentes.
Cette nouvelle table est constituée des champs suivants : id_spipmotion_attente, l’identifiant numérique unique de la tâche à traiter ; id_document, l’identifiant numérique du document original à encoder ; id_objet l’identifiant unique de l’objet auquel le document encodé devra être attaché automatiquement ; objet, le type d’objet auquel (...) -
ANNEXE : Les plugins utilisés spécifiquement pour la ferme
5 mars 2010, parLe site central/maître de la ferme a besoin d’utiliser plusieurs plugins supplémentaires vis à vis des canaux pour son bon fonctionnement. le plugin Gestion de la mutualisation ; le plugin inscription3 pour gérer les inscriptions et les demandes de création d’instance de mutualisation dès l’inscription des utilisateurs ; le plugin verifier qui fournit une API de vérification des champs (utilisé par inscription3) ; le plugin champs extras v2 nécessité par inscription3 (...)
-
Ajouter des informations spécifiques aux utilisateurs et autres modifications de comportement liées aux auteurs
12 avril 2011, parLa manière la plus simple d’ajouter des informations aux auteurs est d’installer le plugin Inscription3. Il permet également de modifier certains comportements liés aux utilisateurs (référez-vous à sa documentation pour plus d’informations).
Il est également possible d’ajouter des champs aux auteurs en installant les plugins champs extras 2 et Interface pour champs extras.
Sur d’autres sites (7689)
-
Trim/cut video in Android app using ffmpeg
10 août 2012, par K-ran-BeastI have downloaded the code for video trimming from github at following link
It’s working perfectly, but when I try to run it for the second time the code crashes without any exception then again when I try to run it for the 3rd time it works after the crash.
Does any one have any idea ?I am developing an application which has one module of trimming videos.
I would really appreciate it, if some one could help me. -
Adjusting The Timetable and SQL Shame
My Game Music Appreciation website has a big problem that many visitors quickly notice and comment upon. The problem looks like this :
The problem is that all of these songs are 2m30s in length. During the initial import process, unless a chiptune file already had curated length metadata attached, my metadata utility emitted a default play length of 150 seconds. This is not good if you want to listen to all the songs in a soundtrack without interacting with the player page, but have various short songs (think “game over” or other quick jingles) that are over in a few seconds. Such songs still pad out 150 seconds of silence.
So I needed to correct this. Possible solutions :
- Manually : At first, I figured I could ask the database which songs needed fixing and listen to them to determine the proper lengths. Then I realized that there were well over 1400 games affected by this problem. This just screams “automated solution”.
- Automatically : Ask the database which songs need fixing and then somehow ask the computer to listen to the songs and decide their proper lengths. This sounds like a winner, provided that I can figure out how to programmatically determine if a song has “finished”.
SQL Shame
This play adjustment task has been on my plate for a long time. A key factor that has blocked me is that I couldn’t figure out a single SQL query to feed to the SQLite database underlying the site which would give me all the songs I needed. To be clear, it was very simple and obvious to me how to write a program that would query the database in phases to get all the information. However, I felt that it would be impure to proceed with the task unless I could figure out one giant query to get all the information.This always seems to come up whenever I start interacting with a database in any serious way. I call it SQL shame. This task got some traction when I got over this nagging doubt and told myself that there’s nothing wrong with the multi-step query program if it solves the problem at hand.
Suddenly, I had a flash of inspiration about why the so-called NoSQL movement exists. Maybe there are a lot more people who don’t like trying to derive such long queries and are happy to allow other languages to pick up the slack.
Estimating Lengths
Anyway, my solution involved writing a Python script to iterate through all the games whose metadata was output by a certain engine (the one that makes the default play length 150 seconds). For each of those games, the script queries the song table and determines if each song is exactly 150 seconds. If it is, then go to work trying to estimate the true length.The forgoing paragraph describes what I figured was possible with only a single (possibly large) SQL query.
For each song represented in the chiptune file, I ran it through a custom length estimator program. My brilliant (err, naïve) solution to the length estimation problem was to synthesize seconds of audio up to a maximum of 120 seconds (tightening up the default length just a bit) and counting how many of those seconds had all 0 samples. If the count reached 5 consecutive seconds of silence, then the estimator rewound the running length by 5 seconds and declared that to be the proper length. Update the database.
There were about 1430 chiptune files whose songs needed updates. Some files had 1 single song. Some files had over 100. When I let the script run, it took nearly 65 minutes to process all the files. That was a single-threaded solution, of course. Even though I already had the data I needed, I wanted to try to hand at parallelizing the script. So I went to work with Python’s multiprocessing module and quickly refactored it to use all 4 CPU threads on the machine where the files live. Results :
- Single-threaded solution : 64m42s to process corpus (22 games/minute)
- Multi-threaded solution : 18m48s with 4 CPU threads (75 games/minute)
More than a 3x speedup across 4 CPU threads, which is decent for a primarily CPU-bound operation.
Epilogue
I suspect that this task will require some refinement or manual intervention. Maybe there are songs which actually have more than 5 legitimate seconds of silence. Also, I entertained the possibility that some songs would generate very low amplitude noise rather than being perfectly silent. In that case, I could refine the script to stipulate that amplitudes below a certain threshold count as 0. Fortunately, I marked which games were modified by this method, so I can run a new script as necessary.SQL Schema
Here is the schema of my SQlite3 database, for those who want to try their hand at a proper query. I am confident that it’s possible ; I just didn’t have the patience to work it out. The task is to retrieve all the rows from the games table where all of the corresponding songs in the songs table is 150000 milliseconds.
-
CREATE TABLE games
-
(
-
id INTEGER PRIMARY KEY AUTOINCREMENT,
-
uncompressed_sha1 TEXT,
-
uncompressed_size INTEGER,
-
compressed_sha1 TEXT,
-
compressed_size INTEGER,
-
system TEXT,
-
game TEXT,
-
gme_system TEXT default NULL,
-
canonical_url TEXT default NULL,
-
extension TEXT default "gamemusicxz",
-
enabled INTEGER default 1,
-
redirect_to_id INT DEFAULT -1,
-
play_lengths_modified INT DEFAULT NULL) ;
-
CREATE TABLE songs
-
(
-
game_id INTEGER,
-
song_number INTEGER NOT NULL,
-
song TEXT,
-
author TEXT,
-
copyright TEXT,
-
dumper TEXT,
-
length INTEGER,
-
intro_length INTEGER,
-
loop_length INTEGER,
-
play_length INTEGER,
-
play_order INTEGER default -1) ;
-
CREATE TABLE tags
-
(
-
game_id INTEGER,
-
tag TEXT NOT NULL,
-
tag_type TEXT default "filename") ;
-
CREATE INDEX gameid_index_songs ON songs(game_id) ;
-
CREATE INDEX gameid_index_tag ON tags(game_id) ;
-
CREATE UNIQUE INDEX sha1_index ON games(uncompressed_sha1) ;
-
Adventures in Unicode
Tangential to multimedia hacking is proper metadata handling. Recently, I have gathered an interest in processing a large corpus of multimedia files which are likely to contain metadata strings which do not fall into the lower ASCII set. This is significant because the lower ASCII set intersects perfectly with my own programming comfort zone. Indeed, all of my programming life, I have insisted on covering my ears and loudly asserting “LA LA LA LA LA ! ALL TEXT EVERYWHERE IS ASCII !” I suspect I’m not alone in this.
Thus, I took this as an opportunity to conquer my longstanding fear of Unicode. I developed a self-learning course comprised of a series of exercises which add up to this diagram :
Part 1 : Understanding Text Encoding
Python has regular strings by default and then it has Unicode strings. The latter are prefixed by the letter ‘u’. This is what ‘ö’ looks like encoded in each type.-
>>> ’ö’, u’ö’
-
(’\xc3\xb6’, u’\xf6’)
A large part of my frustration with Unicode comes from Python yelling at me about UnicodeDecodeErrors and an inability to handle the number 0xc3 for some reason. This usually comes when I’m trying to wrap my head around an unrelated problem and don’t care to get sidetracked by text encoding issues. However, when I studied the above output, I finally understood where the 0xc3 comes from. I just didn’t understand what the encoding represents exactly.
I can see from assorted tables that ‘ö’ is character 0xF6 in various encodings (in Unicode and Latin-1), so u’\xf6′ makes sense. But what does ‘\xc3\xb6′ mean ? It’s my style to excavate straight down to the lowest levels, and I wanted to understand exactly how characters are represented in memory. The UTF-8 encoding tables inform us that any Unicode code point above 0x7F but less than 0×800 will be encoded with 2 bytes :
110xxxxx 10xxxxxx
Applying this pattern to the \xc3\xb6 encoding :
hex : 0xc3 0xb6 bits : 11000011 10110110 important bits : ---00011 —110110 assembled : 00011110110 code point : 0xf6
I was elated when I drew that out and made the connection. Maybe I’m the last programmer to figure this stuff out. But I’m still happy that I actually understand those Python errors pertaining to the number 0xc3 and that I won’t have to apply canned solutions without understanding the core problem.
I’m cheating on this part of this exercise just a little bit since the diagram implied that the Unicode text needs to come from a binary file. I’ll return to that in a bit. For now, I’ll just contrive the following Unicode string from the Python REPL :
-
>>> u = u’Üñìçôđé’
-
>>> u
-
u’\xdc\xf1\xec\xe7\xf4\u0111\xe9’
Part 2 : From Python To SQLite3
The next step is to see what happens when I use Python’s SQLite3 module to dump the string into a new database. Will the Unicode encoding be preserved on disk ? What will UTF-8 look like on disk anyway ?-
>>> import sqlite3
-
>>> conn = sqlite3.connect(’unicode.db’)
-
>>> conn.execute("CREATE TABLE t (t text)")
-
>>> conn.execute("INSERT INTO t VALUES (?)", (u, ))
-
>>> conn.commit()
-
>>> conn.close()
Next, I manually view the resulting database file (unicode.db) using a hex editor and look for strings. Here we go :
000007F0 02 29 C3 9C C3 B1 C3 AC C3 A7 C3 B4 C4 91 C3 A9
Look at that ! It’s just like the \xc3\xf6 encoding we see in the regular Python strings.
Part 3 : From SQLite3 To A Web Page Via PHP
Finally, use PHP (love it or hate it, but it’s what’s most convenient on my hosting provider) to query the string from the database and display it on a web page, completing the outlined processing pipeline.-
< ?php
-
$dbh = new PDO("sqlite:unicode.db") ;
-
foreach ($dbh->query("SELECT t from t") as $row) ;
-
$unicode_string = $row[’t’] ;
-
?>
-
-
<html>
-
<head><meta http-equiv="Content-Type" content="text/html ; charset=utf-8"></meta></head>
-
<body><h1>< ?=$unicode_string ?></h1></body>
-
</html>
I tested the foregoing PHP script on 3 separate browsers that I had handy (Firefox, Internet Explorer, and Chrome) :
I’d say that counts as success ! It’s important to note that the “meta http-equiv” tag is absolutely necessary. Omit and see something like this :
Since we know what the UTF-8 stream looks like, it’s pretty obvious how the mapping is operating here : 0xc3 and 0xc4 correspond to ‘Ã’ and ‘Ä’, respectively. This corresponds to an encoding named ISO/IEC 8859-1, a.k.a. Latin-1. Speaking of which…
Part 4 : Converting Binary Data To Unicode
At the start of the experiment, I was trying to extract metadata strings from these binary multimedia files and I noticed characters like our friend ‘ö’ from above. In the bytestream, this was represented simply with 0xf6. I mistakenly believed that this was the on-disk representation of UTF-8. Wrong. Turns out it’s Latin-1.However, I still need to solve the problem of transforming such strings into Unicode to be shoved through the pipeline diagrammed above. For this experiment, I created a 9-byte file with the Latin-1 string ‘Üñìçôdé’ couched by 0′s, to simulate yanking a string out of a binary file. Here’s unicode.file :
00000000 00 DC F1 EC E7 F4 64 E9 00 ......d..
(Aside : this experiment uses plain ‘d’ since the ‘đ’ with a bar through it doesn’t occur in Latin-1 ; shows up all over the place in Vietnamese, at least.)
I’ve been mashing around Python code via the REPL, trying to get this string into a Unicode-friendly format. This is a successful method but it’s probably not the best :
-
>>> import struct
-
>>> f = open(’unicode.file’, ’r’).read()
-
>>> u = u’’
-
>>> for c in struct.unpack("B"*7, f[1 :8]) :
-
... u += unichr(c)
-
...
-
>>> u
-
u’\xdc\xf1\xec\xe7\xf4d\xe9’
-
>>> print u
-
Üñìçôdé
Conclusion
Dealing with text encoding matters reminds me of dealing with integer endian-ness concerns. When you’re just dealing with one system, you probably don’t need to think too much about it because the system is usually handling everything consistently underneath the covers.However, when the data leaves one system and will be interpreted by another system, that’s when a programmer needs to be cognizant of matters such as integer endianness or text encoding.
-