Recherche avancée

Médias (2)

Mot : - Tags -/documentation

Autres articles (57)

  • Submit bugs and patches

    13 avril 2011

    Unfortunately a software is never perfect.
    If you think you have found a bug, report it using our ticket system. Please to help us to fix it by providing the following information : the browser you are using, including the exact version as precise an explanation as possible of the problem if possible, the steps taken resulting in the problem a link to the site / page in question
    If you think you have solved the bug, fill in a ticket and attach to it a corrective patch.
    You may also (...)

  • Les autorisations surchargées par les plugins

    27 avril 2010, par

    Mediaspip core
    autoriser_auteur_modifier() afin que les visiteurs soient capables de modifier leurs informations sur la page d’auteurs

  • Publier sur MédiaSpip

    13 juin 2013

    Puis-je poster des contenus à partir d’une tablette Ipad ?
    Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir

Sur d’autres sites (13979)

  • Processing Big Data Problems

    8 janvier 2011, par Multimedia Mike — Big Data

    I’m becoming more interested in big data problems, i.e., extracting useful information out of absurdly sized sets of input data. I know it’s a growing field and there is a lot to read on the subject. But you know how I roll— just think of a problem to solve and dive right in.

    Here’s how my adventure unfolded.

    The Corpus
    I need to run a command line program on a set of files I have collected. This corpus is on the order of 350,000 files. The files range from 7 bytes to 175 MB. Combined, they occupy around 164 GB of storage space.

    Oh, and said storage space resides on an external, USB 2.0-connected hard drive. Stop laughing.

    A file is named according to the SHA-1 hash of its data. The files are organized in a directory hierarchy according to the first 6 hex digits of the SHA-1 hash (e.g., a file named a4d5832f... is stored in a4/d5/83/a4d5832f...). All of this file hash, path, and size information is stored in an SQLite database.

    First Pass
    I wrote a Python script that read all the filenames from the database, fed them into a pool of worker processes using Python’s multiprocessing module, and wrote some resulting data for each file back to the SQLite database. My Eee PC has a single-core, hyperthreaded Atom which presents 2 CPUs to the system. Thus, 2 worker threads crunched the corpus. It took awhile. It took somewhere on the order of 9 or 10 or maybe even 12 hours. It took long enough that I’m in no hurry to re-run the test and get more precise numbers.

    At least I extracted my initial set of data from the corpus. Or did I ?

    Think About The Future

    A few days later, I went back to revisit the data only to notice that the SQLite database was corrupted. To add insult to that bit of injury, the script I had written to process the data was also completely corrupted (overwritten with something unrelated to Python code). BTW, this is was on a RAID brick configured for redundancy. So that’s strike 3 in my personal dealings with RAID technology.

    I moved the corpus to a different external drive and also verified the files after writing (easy to do since I already had the SHA-1 hashes on record).

    The corrupted script was pretty simple to rewrite, even a little better than before. Then I got to re-run it. However, this run was on a faster machine, a hyperthreaded, quad-core beast that exposes 8 CPUs to the system. The reason I wasn’t too concerned about the poor performance with my Eee PC is that I knew I was going to be able to run in on this monster later.

    So I let the rewritten script rip. The script gave me little updates regarding its progress. As it did so, I ran some rough calculations and realized that it wasn’t predicted to finish much sooner than it would have if I were running it on the Eee PC.

    Limiting Factors
    It had been suggested to me that I/O bandwidth of the external USB drive might be a limiting factor. This is when I started to take that idea very seriously.

    The first idea I had was to move the SQLite database to a different drive. The script records data to the database for every file processed, though it only commits once every 100 UPDATEs, so at least it’s not constantly syncing the disc. I ran before and after tests with a small subset of the corpus and noticed a substantial speedup thanks to this policy chance.

    Then I remembered hearing something about "atime" which is access time. Linux filesystems, per default, record the time that a file was last accessed. You can watch this in action by running 'stat <file> ; cat <file> > /dev/null ; stat <file>' and observe that the "Access" field has been updated to NOW(). This also means that every single file that gets read from the external drive still causes an additional write. To avoid this, I started mounting the external drive with '-o noatime' which instructs Linux not to record "last accessed" time for files.

    On the limited subset test, this more than doubled script performance. I then wondered about mounting the external drive as read-only. This had the same performance as noatime. I thought about using both options together but verified that access times are not updated for a read-only filesystem.

    A Note On Profiling
    Once you start accessing files in Linux, those files start getting cached in RAM. Thus, if you profile, say, reading a gigabyte file from a disk and get 31 MB/sec, and then repeat the same test, you’re likely to see the test complete instantaneously. That’s because the file is already sitting in memory, cached. This is useful in general application use, but not if you’re trying to profile disk performance.

    Thus, in between runs, do (as root) 'sync; echo 3 > /proc/sys/vm/drop_caches' in order to wipe caches (explained here).

    Even Better ?
    I re-ran the test using these little improvements. Now it takes somewhere around 5 or 6 hours to run.

    I contrived an artificially large file on the external drive and did some 'dd' tests to measure what the drive could really do. The drive consistently measured a bit over 31 MB/sec. If I could read and process the data at 30 MB/sec, the script would be done in about 95 minutes.

    But it’s probably rather unreasonable to expect that kind of transfer rate for lots of smaller files scattered around a filesystem. However, it can’t be that helpful to have 8 different processes constantly asking the HD for 8 different files at any one time.

    So I wrote a script called stream-corpus.py which simply fetched all the filenames from the database and loaded the contents of each in turn, leaving the data to be garbage-collected at Python’s leisure. This test completed in 174 minutes, just shy of 3 hours. I computed an average read speed of around 17 MB/sec.

    Single-Reader Script
    I began to theorize that if I only have one thread reading, performance should improve greatly. To test this hypothesis without having to do a lot of extra work, I cleared the caches and ran stream-corpus.py until 'top' reported that about half of the real memory had been filled with data. Then I let the main processing script loose on the data. As both scripts were using sorted lists of files, they iterated over the filenames in the same order.

    Result : The processing script tore through the files that had obviously been cached thanks to stream-corpus.py, degrading drastically once it had caught up to the streaming script.

    Thus, I was incented to reorganize the processing script just slightly. Now, there is a reader thread which reads each file and stuffs the name of the file into an IPC queue that one of the worker threads can pick up and process. Note that no file data is exchanged between threads. No need— the operating system is already implicitly holding onto the file data, waiting in case someone asks for it again before something needs that bit of RAM. Technically, this approach accesses each file multiple times. But it makes little practical difference thanks to caching.

    Result : About 183 minutes to process the complete corpus (which works out to a little over 16 MB/sec).

    Why Multiprocess
    Is it even worthwhile to bother multithreading this operation ? Monitoring the whole operation via 'top', most instances of the processing script are barely using any CPU time. Indeed, it’s likely that only one of the worker threads is doing any work most of the time, pulling a file out of the IPC queue as soon the reader thread triggers its load into cache. Right now, the processing is usually pretty quick. There are cases where the processing (external program) might hang (one of the reasons I’m running this project is to find those cases) ; the multiprocessing architecture at least allows other processes to take over until a hanging process is timed out and killed by its monitoring process.

    Further, the processing is pretty simple now but is likely to get more intensive in future iterations. Plus, there’s the possibility that I might move everything onto a more appropriately-connected storage medium which should help alleviate the bottleneck bravely battled in this post.

    There’s also the theoretical possibility that the reader thread could read too far ahead of the processing threads. Obviously, that’s not too much of an issue in the current setup. But to guard against it, the processes could share a variable that tracks the total number of bytes that have been processed. The reader thread adds filesizes to the count while the processing threads subtract file sizes. The reader thread would delay reading more if the number got above a certain threshold.

    Leftovers
    I wondered if the order of accessing the files mattered. I didn’t write them to the drive in any special order. The drive is formatted with Linux ext3. I ran stream-corpus.py on all the filenames sorted by filename (remember the SHA-1 naming convention described above) and also by sorting them randomly.

    Result : It helps immensely for the filenames to be sorted. The sorted variant was a little more than twice as fast as the random variant. Maybe it has to do with accessing all the files in a single directory before moving onto another directory.

    Further, I have long been under the impression that the best read speed you can expect from USB 2.0 was 27 Mbytes/sec (even though 480 Mbit/sec is bandied about in relation to the spec). This comes from profiling I performed with an external enclosure that supports both USB 2.0 and FireWire-400 (and eSata). FW-400 was able to read the same file at nearly 40 Mbytes/sec that USB 2.0 could only read at 27 Mbytes/sec. Other sources I have read corroborate this number. But this test (using different hardware), achieved over 31 Mbytes/sec.

  • VP8 Codec Optimization Update

    16 juin 2010, par noreply@blogger.com (John Luther) — inside webm

    Since WebM launched in May, the team has been working hard to make the VP8 video codec faster. Our community members have contributed improvements, but there’s more work to be done in some interesting areas related to performance (more on those below).


    Encoder


    The VP8 encoder is ripe for speed optimizations. Scott LaVarnway’s efforts in writing an x86 assembly version of the quantizer will help in this goal significantly as the quantizer is called many times while the encoder makes decisions about how much detail from the image will be transmitted.

    For those of you eager to get involved, one piece of low-hanging fruit is writing a SIMD version of the ARNR temporal filtering code. Also, much of the assembly code only makes use of the SSE2 instruction set, and there surely are newer extensions that could be made use of. There are also redundant code removal and other general cleanup to be done ; (Yaowu Xu has submitted some changes for these).

    At a higher level, someone can explore some alternative motion search strategies in the encoder. Eventually the motion search can be decoupled entirely to allow motion fields to be calculated elsewhere (for example, on a graphics processor).

    Decoder


    Decoder optimizations can bring higher resolutions and smoother playback to less powerful hardware.

    Jeff Muizelaar has submitted some changes which combine the IDCT and summation with the predicted block into a single function, helping us avoid storing the intermediate result, thus reducing memory transfers and avoiding cache pollution. This changes the assembly code in a fundamental way, so we will need to sync the other platforms up or switch them to a generic C implementation and accept the performance regression. Johann Koenig is working on implementing this change for ARM processors, and we’ll merge these changes into the mainline soon.

    In addition, Tim Terriberry is attacking a different method of bounds checking on the "bool decoder." The bool decoder is performance-critical, as it is called several times for each bit in the input stream. The current code handles this check with a simple clamp in the innermost loops and a less-frequent copy into a circular buffer. This can be expensive at higher data rates. Tim’s patch removes the circular buffer, but uses a more complex clamp in the innermost loops. These inner loops have historically been troublesome on embedded platforms.

    To contribute in these efforts, I’ve started working on rewriting higher-level parts of the decoder. I believe there is an opportunity to improve performance by paying better attention to data locality and cache layout, and reducing memory bus traffic in general. Another area I plan to explore is improving utilization in the multi-threaded decoder by separating the bitstream decoding from the rest of the image reconstruction, using work units larger than a single macroblock, and not tying functionality to a specific thread. To get involved in these areas, subscribe to the codec-devel mailing list and provide feedback on the code as it’s written.

    Embedded Processors


    We want to optimize multiple platforms, not just desktops. Fritz Koenig has already started looking at the performance of VP8 on the Intel Atom platform. This platform need some attention as we wrote our current x86 assembly code with an out-of-order processor in mind. Since Atom is an in-order processor (much like the original Pentium), the instruction scheduling of all of the x86 assembly code needs to be reexamined. One option we’re looking at is scheduling the code for the Atom processor and seeing if that impacts the performance on other x86 platforms such as the Via C3 and AMD Geode. This is shaping up to be a lot of work, but doing it would provide us with an opportunity to tighten up our assembly code.

    These issues, along with wanting to make better use of the larger register file on x86_64, may reignite every assembly programmer’s (least ?) favorite debate : whether or not to use intrinsics. Yunqing Wang has been experimenting with this a bit, but initial results aren’t promising. If you have experience in dealing with a lot of assembly code across several similar-but-kinda-different platforms, these maintainability issues might be familiar to you. I hope you’ll share your thoughts and experiences on the codec-devel mailing list.

    Optimizing codecs is an iterative (some would say never-ending) process, so stay tuned for more posts on the progress we’re making, and by all means, start hacking yourself.

    It’s exciting to see that we’re starting to get substantial code contributions from developers outside of Google, and I look forward to more as WebM grows into a strong community effort.

    John Koleszar is a software engineer at Google.

  • Anomalie #2036 : nom des feuilles de style calculée

    2 juin 2011, par jluc -

    Le media n’est pas une propriété d’une feuille CSS en général,non, mais dans le cas des feuilles compactées, c’est pourtant le cas puisqu’une css compactée rassemble des utilisations de feuilles ayant le même média. On peut donc utilement documenter une feuille compactée par le media pour lequel elle (...)