
Recherche avancée
Médias (1)
-
Rennes Emotion Map 2010-11
19 octobre 2011, par
Mis à jour : Juillet 2013
Langue : français
Type : Texte
Autres articles (87)
-
Qu’est ce qu’un éditorial
21 juin 2013, parEcrivez votre de point de vue dans un article. Celui-ci sera rangé dans une rubrique prévue à cet effet.
Un éditorial est un article de type texte uniquement. Il a pour objectif de ranger les points de vue dans une rubrique dédiée. Un seul éditorial est placé à la une en page d’accueil. Pour consulter les précédents, consultez la rubrique dédiée.
Vous pouvez personnaliser le formulaire de création d’un éditorial.
Formulaire de création d’un éditorial Dans le cas d’un document de type éditorial, les (...) -
Participer à sa traduction
10 avril 2011Vous pouvez nous aider à améliorer les locutions utilisées dans le logiciel ou à traduire celui-ci dans n’importe qu’elle nouvelle langue permettant sa diffusion à de nouvelles communautés linguistiques.
Pour ce faire, on utilise l’interface de traduction de SPIP où l’ensemble des modules de langue de MediaSPIP sont à disposition. ll vous suffit de vous inscrire sur la liste de discussion des traducteurs pour demander plus d’informations.
Actuellement MediaSPIP n’est disponible qu’en français et (...) -
MediaSPIP v0.2
21 juin 2013, parMediaSPIP 0.2 est la première version de MediaSPIP stable.
Sa date de sortie officielle est le 21 juin 2013 et est annoncée ici.
Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
Comme pour la version précédente, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...)
Sur d’autres sites (11750)
-
Revision d7eea782f2 : Extend number of reference buffers to 8. The number of reference buffers is ext
3 mai 2013, par Adrian GrangeChanged Paths :
Modify /vp9/common/vp9_entropymode.c
Modify /vp9/common/vp9_onyxc_int.h
Modify /vp9/decoder/vp9_decodframe.c
Modify /vp9/encoder/vp9_bitstream.c
Extend number of reference buffers to 8.The number of reference buffers is extended to 8 and
a reference sign-bias added for the LAST_FRAME.Whilst the number of reference buffers used by an
individual frame remains unchanged at 3, these may
now be selected from 8 possible buffers.Change-Id : I2d247b9c1c2b3a339d6c9fac125e81ba373f75a7
-
Cuda Memory Management : re-using device memory from C calls (multithreaded, ffmpeg), but failing on cudaMemcpy
4 mars 2013, par Nuke StollakI'm trying to CUDA-fy my ffmpeg filter that was taking over 90% of the CPU time, according to gprof. I first went from one core to OpenMP on 4 cores and got a 3.8x increase in frames encoded per second, but it's still too slow. CUDA seemed like the next natural step.
I've gotten a modest (20% ?) increase by replacing one of my filter's functions with a CUDA kernel call, and just to get things up and running, I was cudaMalloc'ing and cudaMemcpy'ing on each frame. I suspected I would get better results if I weren't doing this each frame, so before I go ahead and move the rest of my code to CUDA, I wanted to fix this by allocating the memory before my filter is called and freeing it afterwards, but the device memory isn't having it. I'm only storing the device memory locations outside of code that knows about CUDA ; I'm not trying to use the data there, just save it for the next time I call a CUDA-aware function that needs it.
Here's where I am so far :
Environment : the last AMI Linux on EC2's GPU Cluster, latest updates installed. Everything is fairly standard.
My filter is split into two files : vf_myfilter.c (compiled by gcc, like almost every other file in ffmpeg) and vf_myfilter_cu.cu (compiled by nvcc). My Makefile's link step includes
-lcudart
and both .o files. I build vf_myfilter_cu.o using (as one line)nvcc -I. -I./ -I/opt/nvidia/cuda/include $(CPPFLAGS)
-Xcompiler "$(CFLAGS)"
-c -o libfilter/vf_myfilter_cu.o libfilter/vf_myfilter_cu.cuWhen the variables (set by configure) are expanded, here's what I get, again all in one line but split up here for easier reading. I just noticed the duplicate include path directives, but it shouldn't hurt.
nvcc -I. -I./ -I/opt/nvidia/cuda/include -I. -I./ -D_ISOC99_SOURCE
-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_POSIX_C_SOURCE=200112
-D_XOPEN_SOURCE=600 -DHAVE_AV_CONFIG_H
-XCompiler "-fopenmp -std=c99 -fomit-frame-pointer -pthread -g
-Wdeclaration-after-statment -Wall -Wno-parentheses
-Wno-switch -Wno-format-zero-length -Wdisabled-optimization
-Wpointer-arith -Wredundant-decls -Wno-pointer-sign
-Wwrite-strings -Wtype-limits -Wundef -Wmissing-prototypes
-Wno-pointer-to-int-case -Wstrict-prototypes -O3 -fno-math-errno
-fno-signed-zeros -fno-tree-vectorize
-Werror=implicit-function-declaration -Werror=missing-prototypes
-Werror=vla "
-c -o libavfilter/vf_myfilter_cu.o libavfilter/vf_myfilter_cu.cuvf_myfilter.c calls three functions from vf_myfilter_cu.cu file which handle memory and call the CUDA kernel code. I thought I would be able to save the device pointers from my memory initialization, which runs once per ffmpeg run, and re-use that space each time I called the wrapper for my kernel function, but when I cudaMemcpy from my host memory to my device memory that I stored, it fails with cudaInvalidValue. If I cudaMalloc my device memory on every frame, I'm fine.
I plan on using pinned host memory, once I have everything up in CUDA code and have minimized the number of times I need to return to the main ffmpeg code.
Steps taken :
First sign of trouble : search the web. I found Passing a pointer to device memory between classes in CUDA and printed out the pointers at various places in my execution to ensure that the device memory values were the same everywhere, and they are. FWIW, they seem to start around 0x90010000.
ffmpeg's
configure
gave me -pthreads, so I checked to see if my filter was being called from multiple threads according to how can I tell if pthread_self is the main (first) thread in the process ? and checkingsyscall(SYS_gettid) == getpid()
to ensure that I'm not calling CUDA from different threads—I'm indeed in the primary thread at every step, according to those two funcs. I am still using OpenMP later around some for loops in the main .c filter function, but the calls to CUDA don't occur in those loops.Code Overview :
ffmpeg provides me a MyfilterContext structure pointer on each frame, as well as on the filter's config_input and uninit routines (called once per file), so I added some *host_var and *dev_var variables (a few of each, float and unsigned char).
There is a whole lot of code I skipped for this post, but most of it has to do with my algorithm and details involved in writing an ffmpeg filter. I'm actually using about 6 host variables and 7 device variables right now, but for demonstration I limited it to one of each.
Here is, broadly, what my vf_myfilter.c looks like.
// declare my functions from vf_myfilter_cu.cu
extern void cudaMyInit(unsigned char **dev_var, size_t mysize);
extern void cudaMyUninit(unsigned char *dev_var);
extern void cudaMyFunction(unsigned char *host_var, unsigned char *dev_var, size_t mysize);
// part of the MyFilterContext structure, which ffmpeg keeps track of for me.
typedef struct {
unsigned char *host_var;
unsigned char *dev_var;
} MyFilterContext;
// ffmpeg calls this function once per file, before any frames are processed.
static int config_input(AVFilterLink *inlink) {
// how ffmpeg passes me my context, fairly standard.
MyfilterContext * myContext = inlink->dst->priv;
// compute the size one video plane of one frame of video
size_t mysize = sizeof(unsigned char) * inlink->w * inlink->h;
// av_mallocz is a malloc wrapper provided and required by ffmpeg
myContext->host_var = (unsigned char*) av_mallocz(size);
// Here's where I attempt to allocate my device memory.
cudaMyInit( & myContext->dev_var, mysize);
}
// Called once per frame of video
static int filter_frame(AVFilterLink *inlink, AVFilterBufferRef *frame) {
MyFilterContext *myContext = inlink->dst->priv;
// sanity check to make sure that this isn't part of the multithreaded code
if ( syscall(SYS_gettid) == getpid() )
av_log(.... ); // This line never runs, so it's not threaded?
// ...fill host_var with data from frame,
// set mysize to the size of the buffer
// Call my wrapper function defined in the .cu file
cudaMyFunction(myContext->host_var, myContext->dev_var, mysize);
// ... take the results from host_var and apply them to frame
// ... and return the processed frame to ffmpeg
}
// called after everything else has happened: free up the memory.
static av_cold void uninit(AVFilterContext *ctx) {
MyFilterContext *myContext = ctx->priv;
// free my host_var
if(myContext->host_var!=NULL) {
av_free(myContext->host_var);
myContext->host_var=NULL;
}
// free my dev_var
cudaMyUninit(myContext->dev_var);
}Here is, broadly, what my vf_myfilter_cu.cu looks like :
// my kernel function that does the work.
__global__ void myfunc(unsigned char *dev_var, size_t mysize) {
// find the offset for this particular GPU thread to process
// exit this function if the block/thread combo points to somewhere
// outside the frame
// make sure we're less than mysize bytes from the beginning of dev_var
// do things to dev_var[some_offset]
}
// Allocate the device memory
extern "C" void cudaMyInit(unsigned char **dev_var, size_t mysize) {
if(cudaMalloc( (void**) dev_var, mysize) != cudaSuccess) {
printf("Cannot allocate the memory\n");
}
}
// Free the device memory.
extern "C" void cudaMyUninit(unsigned char *dev_var) {
cudaFree(dev_var);
}
// Copy data from the host to the device,
// Call the kernel function, and
// Copy data from the device to the host.
extern "C" void cudaMyFunction(
unsigned char *host_var,
unsigned char *dev_var,
size_t mysize )
{
cudaError_t cres;
// dev_works is what I want to get rid of, but
// to make sure that there's not something more obvious going
// on, I made sure that my cudaMemcpy works if I'm allocating
// the device memory in every frame.
unsigned char *dev_works;
if(cudaMalloc( (void **) &dev_works, mysize)!=cudaSuccess) {
// I don't see this message
printf("failed at per-frame malloc\n");
}
// THIS PART WORKS, copying host_var to dev_works
cres=cudaMemcpy( (void *) dev_works, host_var, mysize, cudaMemcpyHostToDevice);
if(cres!=cudaSuccess) {
if(cres==cudaErrorInvalidValue) {
// I don't see this message.
printf("cudaErrorInvalidValue at per-frame cudaMemcpy\n");
}
}
// THIS PART FAILS, copying host_var to dev_var
cres=cudaMemcpy( (void *) dev_var, host_var, mysize, cudaMemcpyHostToDevice);
if(cres!=cudaSuccess) {
if(cres==cudaErrorInvalidValue) {
// this is the error code that prints.
printf("cudaErrorInvalidValue at per-frame cudaMemcpy\n");
}
// I check for other error codes, but they're not being hit.
}
// and this works with dev_works
myfunc<<>>(dev_works, mysize);
if(cudaMemcpy(host_var, dev_works, mysize, cudaMemcpyDeviceToHost)!=cudaSuccess) {
// I don't see this message.
printf("Failed to copy post-kernel func\n");
}
cudaFree(dev_works);
}Any ideas ?
-
Revision 22012ee994 : optimize 8x8 fdct rounding for accuracy The commit added a final rounding choic
22 février 2013, par Yaowu XuChanged Paths : Modify /test/fdct8x8_test.cc Modify /test/test.mk Modify /vp9/encoder/vp9_dct.c optimize 8x8 fdct rounding for accuracy The commit added a final rounding choice for 8x8 forward dct to get rid of a sign bias at DC position and improve the accuracry in term of round trip error for 8x8 (...)