
Recherche avancée
Médias (91)
-
MediaSPIP Simple : futur thème graphique par défaut ?
26 septembre 2013, par
Mis à jour : Octobre 2013
Langue : français
Type : Video
-
avec chosen
13 septembre 2013, par
Mis à jour : Septembre 2013
Langue : français
Type : Image
-
sans chosen
13 septembre 2013, par
Mis à jour : Septembre 2013
Langue : français
Type : Image
-
config chosen
13 septembre 2013, par
Mis à jour : Septembre 2013
Langue : français
Type : Image
-
SPIP - plugins - embed code - Exemple
2 septembre 2013, par
Mis à jour : Septembre 2013
Langue : français
Type : Image
-
GetID3 - Bloc informations de fichiers
9 avril 2013, par
Mis à jour : Mai 2013
Langue : français
Type : Image
Autres articles (97)
-
Organiser par catégorie
17 mai 2013, parDans MédiaSPIP, une rubrique a 2 noms : catégorie et rubrique.
Les différents documents stockés dans MédiaSPIP peuvent être rangés dans différentes catégories. On peut créer une catégorie en cliquant sur "publier une catégorie" dans le menu publier en haut à droite ( après authentification ). Une catégorie peut être rangée dans une autre catégorie aussi ce qui fait qu’on peut construire une arborescence de catégories.
Lors de la publication prochaine d’un document, la nouvelle catégorie créée sera proposée (...) -
Ecrire une actualité
21 juin 2013, parPrésentez les changements dans votre MédiaSPIP ou les actualités de vos projets sur votre MédiaSPIP grâce à la rubrique actualités.
Dans le thème par défaut spipeo de MédiaSPIP, les actualités sont affichées en bas de la page principale sous les éditoriaux.
Vous pouvez personnaliser le formulaire de création d’une actualité.
Formulaire de création d’une actualité Dans le cas d’un document de type actualité, les champs proposés par défaut sont : Date de publication ( personnaliser la date de publication ) (...) -
Formulaire personnalisable
21 juin 2013, parCette page présente les champs disponibles dans le formulaire de publication d’un média et il indique les différents champs qu’on peut ajouter. Formulaire de création d’un Media
Dans le cas d’un document de type média, les champs proposés par défaut sont : Texte Activer/Désactiver le forum ( on peut désactiver l’invite au commentaire pour chaque article ) Licence Ajout/suppression d’auteurs Tags
On peut modifier ce formulaire dans la partie :
Administration > Configuration des masques de formulaire. (...)
Sur d’autres sites (7469)
-
Fix jQuery loader to use path of itself. Fixes nested demos.
2 mars 2013, par jzaeffererm lib/jquery.js
Fix jQuery loader to use path of itself. Fixes nested demos.
-
Cuda Memory Management : re-using device memory from C calls (multithreaded, ffmpeg), but failing on cudaMemcpy
4 mars 2013, par Nuke StollakI'm trying to CUDA-fy my ffmpeg filter that was taking over 90% of the CPU time, according to gprof. I first went from one core to OpenMP on 4 cores and got a 3.8x increase in frames encoded per second, but it's still too slow. CUDA seemed like the next natural step.
I've gotten a modest (20% ?) increase by replacing one of my filter's functions with a CUDA kernel call, and just to get things up and running, I was cudaMalloc'ing and cudaMemcpy'ing on each frame. I suspected I would get better results if I weren't doing this each frame, so before I go ahead and move the rest of my code to CUDA, I wanted to fix this by allocating the memory before my filter is called and freeing it afterwards, but the device memory isn't having it. I'm only storing the device memory locations outside of code that knows about CUDA ; I'm not trying to use the data there, just save it for the next time I call a CUDA-aware function that needs it.
Here's where I am so far :
Environment : the last AMI Linux on EC2's GPU Cluster, latest updates installed. Everything is fairly standard.
My filter is split into two files : vf_myfilter.c (compiled by gcc, like almost every other file in ffmpeg) and vf_myfilter_cu.cu (compiled by nvcc). My Makefile's link step includes
-lcudart
and both .o files. I build vf_myfilter_cu.o using (as one line)nvcc -I. -I./ -I/opt/nvidia/cuda/include $(CPPFLAGS)
-Xcompiler "$(CFLAGS)"
-c -o libfilter/vf_myfilter_cu.o libfilter/vf_myfilter_cu.cuWhen the variables (set by configure) are expanded, here's what I get, again all in one line but split up here for easier reading. I just noticed the duplicate include path directives, but it shouldn't hurt.
nvcc -I. -I./ -I/opt/nvidia/cuda/include -I. -I./ -D_ISOC99_SOURCE
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200112
-D_XOPEN_SOURCE=600 -DHAVE_AV_CONFIG_H
-XCompiler "-fopenmp -std=c99 -fomit-frame-pointer -pthread -g
-Wdeclaration-after-statment -Wall -Wno-parentheses
-Wno-switch -Wno-format-zero-length -Wdisabled-optimization
-Wpointer-arith -Wredundant-decls -Wno-pointer-sign
-Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes
-Wno-pointer-to-int-case -Wstrict-prototypes -O3 -fno-math-errno
-fno-signed-zeros -fno-tree-vectorize
-Werror=implicit-function-declaration -Werror=missing-prototypes
-Werror=vla "
-c -o libavfilter/vf_myfilter_cu.o libavfilter/vf_myfilter_cu.cuvf_myfilter.c calls three functions from vf_myfilter_cu.cu file which handle memory and call the CUDA kernel code. I thought I would be able to save the device pointers from my memory initialization, which runs once per ffmpeg run, and re-use that space each time I called the wrapper for my kernel function, but when I cudaMemcpy from my host memory to my device memory that I stored, it fails with cudaInvalidValue. If I cudaMalloc my device memory on every frame, I'm fine.
I plan on using pinned host memory, once I have everything up in CUDA code and have minimized the number of times I need to return to the main ffmpeg code.
Steps taken :
First sign of trouble : search the web. I found Passing a pointer to device memory between classes in CUDA and printed out the pointers at various places in my execution to ensure that the device memory values were the same everywhere, and they are. FWIW, they seem to start around 0x90010000.
ffmpeg's
configure
gave me -pthreads, so I checked to see if my filter was being called from multiple threads according to how can I tell if pthread_self is the main (first) thread in the process ? and checkingsyscall(SYS_gettid) == getpid()
to ensure that I'm not calling CUDA from different threads—I'm indeed in the primary thread at every step, according to those two funcs. I am still using OpenMP later around some for loops in the main .c filter function, but the calls to CUDA don't occur in those loops.Code Overview :
ffmpeg provides me a MyfilterContext structure pointer on each frame, as well as on the filter's config_input and uninit routines (called once per file), so I added some *host_var and *dev_var variables (a few of each, float and unsigned char).
There is a whole lot of code I skipped for this post, but most of it has to do with my algorithm and details involved in writing an ffmpeg filter. I'm actually using about 6 host variables and 7 device variables right now, but for demonstration I limited it to one of each.
Here is, broadly, what my vf_myfilter.c looks like.
// declare my functions from vf_myfilter_cu.cu
extern void cudaMyInit(unsigned char **dev_var, size_t mysize);
extern void cudaMyUninit(unsigned char *dev_var);
extern void cudaMyFunction(unsigned char *host_var, unsigned char *dev_var, size_t mysize);
// part of the MyFilterContext structure, which ffmpeg keeps track of for me.
typedef struct {
unsigned char *host_var;
unsigned char *dev_var;
} MyFilterContext;
// ffmpeg calls this function once per file, before any frames are processed.
static int config_input(AVFilterLink *inlink) {
// how ffmpeg passes me my context, fairly standard.
MyfilterContext * myContext = inlink->dst->priv;
// compute the size one video plane of one frame of video
size_t mysize = sizeof(unsigned char) * inlink->w * inlink->h;
// av_mallocz is a malloc wrapper provided and required by ffmpeg
myContext->host_var = (unsigned char*) av_mallocz(size);
// Here's where I attempt to allocate my device memory.
cudaMyInit( & myContext->dev_var, mysize);
}
// Called once per frame of video
static int filter_frame(AVFilterLink *inlink, AVFilterBufferRef *frame) {
MyFilterContext *myContext = inlink->dst->priv;
// sanity check to make sure that this isn't part of the multithreaded code
if ( syscall(SYS_gettid) == getpid() )
av_log(.... ); // This line never runs, so it's not threaded?
// ...fill host_var with data from frame,
// set mysize to the size of the buffer
// Call my wrapper function defined in the .cu file
cudaMyFunction(myContext->host_var, myContext->dev_var, mysize);
// ... take the results from host_var and apply them to frame
// ... and return the processed frame to ffmpeg
}
// called after everything else has happened: free up the memory.
static av_cold void uninit(AVFilterContext *ctx) {
MyFilterContext *myContext = ctx->priv;
// free my host_var
if(myContext->host_var!=NULL) {
av_free(myContext->host_var);
myContext->host_var=NULL;
}
// free my dev_var
cudaMyUninit(myContext->dev_var);
}Here is, broadly, what my vf_myfilter_cu.cu looks like :
// my kernel function that does the work.
__global__ void myfunc(unsigned char *dev_var, size_t mysize) {
// find the offset for this particular GPU thread to process
// exit this function if the block/thread combo points to somewhere
// outside the frame
// make sure we're less than mysize bytes from the beginning of dev_var
// do things to dev_var[some_offset]
}
// Allocate the device memory
extern "C" void cudaMyInit(unsigned char **dev_var, size_t mysize) {
if(cudaMalloc( (void**) dev_var, mysize) != cudaSuccess) {
printf("Cannot allocate the memory\n");
}
}
// Free the device memory.
extern "C" void cudaMyUninit(unsigned char *dev_var) {
cudaFree(dev_var);
}
// Copy data from the host to the device,
// Call the kernel function, and
// Copy data from the device to the host.
extern "C" void cudaMyFunction(
unsigned char *host_var,
unsigned char *dev_var,
size_t mysize )
{
cudaError_t cres;
// dev_works is what I want to get rid of, but
// to make sure that there's not something more obvious going
// on, I made sure that my cudaMemcpy works if I'm allocating
// the device memory in every frame.
unsigned char *dev_works;
if(cudaMalloc( (void **) &dev_works, mysize)!=cudaSuccess) {
// I don't see this message
printf("failed at per-frame malloc\n");
}
// THIS PART WORKS, copying host_var to dev_works
cres=cudaMemcpy( (void *) dev_works, host_var, mysize, cudaMemcpyHostToDevice);
if(cres!=cudaSuccess) {
if(cres==cudaErrorInvalidValue) {
// I don't see this message.
printf("cudaErrorInvalidValue at per-frame cudaMemcpy\n");
}
}
// THIS PART FAILS, copying host_var to dev_var
cres=cudaMemcpy( (void *) dev_var, host_var, mysize, cudaMemcpyHostToDevice);
if(cres!=cudaSuccess) {
if(cres==cudaErrorInvalidValue) {
// this is the error code that prints.
printf("cudaErrorInvalidValue at per-frame cudaMemcpy\n");
}
// I check for other error codes, but they're not being hit.
}
// and this works with dev_works
myfunc<<>>(dev_works, mysize);
if(cudaMemcpy(host_var, dev_works, mysize, cudaMemcpyDeviceToHost)!=cudaSuccess) {
// I don't see this message.
printf("Failed to copy post-kernel func\n");
}
cudaFree(dev_works);
}Any ideas ?
-
Zlib vs. XZ on 2SF
I recently released my Game Music Appreciation website. It allows users to play an enormous range of video game music directly in their browsers. To do this, the site has to host the music. And since I’m a compression bore, I have to know how small I can practically make these music files. I already published the results of my effort to see if XZ could beat RAR (RAR won, but only slightly, and I still went with XZ for the project) on the corpus of Super Nintendo chiptune sets. Next is the corpus of Nintendo DS chiptunes.
Repacking Nintendo DS 2SF
The prevailing chiptune format for storing Nintendo DS songs is the .2sf format. This is a subtype of the Portable Sound Format (PSF). The designers had the foresight to build compression directly into the format. Much of payload data in a PSF file is compressed with zlib. Since I already incorporated Embedded XZ into the player project, I decided to try repacking the PSF payload data from zlib -> xz.In an effort to not corrupt standards too much, I changed the ’PSF’ file signature (seen in the first 3 bytes of a file) to ’psf’.
Results
There are about 900 Nintendo DS games currently represented in my website’s archive. Total size of the original PSF archive, payloads packed with zlib : 2.992 GB. Total size of the same archive with payloads packed as xz : 2.059 GB.Using xz vs. zlib saved me nearly a gigabyte of storage. That extra storage doesn’t really impact my hosting plan very much (I have 1/2 TB, which is why I’m so nonchalant about hosting the massive MPlayer Samples Archive). However, smaller individual files translates to a better user experience since the files are faster to download.
Here is a pretty picture to illustrate the space savings :
The blue occasionally appears to dip below the orange but the data indicates that xz is always more efficient than zlib. Here’s the raw data (comes in vanilla CSV flavor too).
Interface Impact
So the good news for the end user is that the songs are faster to load up front. The downside is that there can be a noticeable delay when changing tracks. Even though all songs are packaged into one file for download, and the entire file is downloaded before playback begins, each song is individually compressed. Thus, changing tracks triggers another decompression operation. I’m toying the possibility of some sort of background process that decompresses song (n+1) while playing song (n) in order to help compensate for this.I don’t like the idea of decompressing everything up front because A) it would take even longer to start playing ; and B) it would take a huge amount of memory.
Corner Case
There was at least one case in which I found zlib to be better than xz. It looks like zlib’s minimum block size is smaller than xz’s. I think I discovered xz to be unable to compress a few bytes to a block any smaller than about 60-64 bytes while zlib got it down into the teens. However, in those cases, it was more efficient to just leave the data uncompressed anyway.