
Recherche avancée
Autres articles (87)
-
L’utiliser, en parler, le critiquer
10 avril 2011La première attitude à adopter est d’en parler, soit directement avec les personnes impliquées dans son développement, soit autour de vous pour convaincre de nouvelles personnes à l’utiliser.
Plus la communauté sera nombreuse et plus les évolutions seront rapides ...
Une liste de discussion est disponible pour tout échange entre utilisateurs. -
Use, discuss, criticize
13 avril 2011, parTalk to people directly involved in MediaSPIP’s development, or to people around you who could use MediaSPIP to share, enhance or develop their creative projects.
The bigger the community, the more MediaSPIP’s potential will be explored and the faster the software will evolve.
A discussion list is available for all exchanges between users. -
Mediabox : ouvrir les images dans l’espace maximal pour l’utilisateur
8 février 2011, parLa visualisation des images est restreinte par la largeur accordée par le design du site (dépendant du thème utilisé). Elles sont donc visibles sous un format réduit. Afin de profiter de l’ensemble de la place disponible sur l’écran de l’utilisateur, il est possible d’ajouter une fonctionnalité d’affichage de l’image dans une boite multimedia apparaissant au dessus du reste du contenu.
Pour ce faire il est nécessaire d’installer le plugin "Mediabox".
Configuration de la boite multimédia
Dès (...)
Sur d’autres sites (9002)
-
Announcing the world’s fastest VP8 decoder : ffvp8
Back when I originally reviewed VP8, I noted that the official decoder, libvpx, was rather slow. While there was no particular reason that it should be much faster than a good H.264 decoder, it shouldn’t have been that much slower either ! So, I set out with Ronald Bultje and David Conrad to make a better one in FFmpeg. This one would be community-developed and free from the beginning, rather than the proprietary code-dump that was libvpx. A few weeks ago the decoder was complete enough to be bit-exact with libvpx, making it the first independent free implementation of a VP8 decoder. Now, with the first round of optimizations complete, it should be ready for primetime. I’ll go into some detail about the development process, but first, let’s get to the real meat of this post : the benchmarks.
We tested on two 1080p clips : Parkjoy, a live-action 1080p clip, and the Sintel trailer, a CGI 1080p clip. Testing was done using “time ffmpeg -vcodec libvpx or vp8 -i input -vsync 0 -an -f null -”. We all used the latest SVN FFmpeg at the time of this posting ; the last revision optimizing the VP8 decoder was r24471.
As these benchmarks show, ffvp8 is clearly much faster than libvpx, particularly on 64-bit. It’s even faster by a large margin on Atom, despite the fact that we haven’t even begun optimizing for it. In many cases, ffvp8′s extra speed can make the difference between a video that plays and one that doesn’t, especially in modern browsers with software compositing engines taking up a lot of CPU time. Want to get faster playback of VP8 videos ? The next versions of FFmpeg-based players, like VLC, will include ffvp8. Want to get faster playback of WebM in your browser ? Lobby your browser developers to use ffvp8 instead of libvpx. I expect Chrome to switch first, as they already use libavcodec for most of their playback system.
Keep in mind ffvp8 is not “done” — we will continue to improve it and make it faster. We still have a number of optimizations in the pipeline that aren’t committed yet.
Developing ffvp8
The initial challenge, primarily pioneered by David and Ronald, was constructing the core decoder and making it bit-exact to libvpx. This was rather challenging, especially given the lack of a real spec. Many parts of the spec were outright misleading and contradicted libvpx itself. It didn’t help that the suite of official conformance tests didn’t even cover all the features used by the official encoder ! We’ve already started adding our own conformance tests to deal with this. But I’ve complained enough in past posts about the lack of a spec ; let’s get onto the gritty details.
The next step was adding SIMD assembly for all of the important DSP functions. VP8′s motion compensation and deblocking filter are by far the most CPU-intensive parts, much the same as in H.264. Unlike H.264, the deblocking filter relies on a lot of internal saturation steps, which are free in SIMD but costly in a normal C implementation, making the plain C code even slower. Of course, none of this is a particularly large problem ; any sane video decoder has all this stuff in SIMD.
I tutored Ronald in x86 SIMD and wrote most of the motion compensation, intra prediction, and some inverse transforms. Ronald wrote the rest of the inverse transforms and a bit of the motion compensation. He also did the most difficult part : the deblocking filter. Deblocking filters are always a bit difficult because every one is different. Motion compensation, by comparison, is usually very similar regardless of video format ; a 6-tap filter is a 6-tap filter, and most of the variation going on is just the choice of numbers to multiply by.
The biggest challenge in an SIMD deblocking filter is to avoid unpacking, that is, going from 8-bit to 16-bit. Many operations in deblocking filters would naively appear to require more than 8-bit precision. A simple example in the case of x86 is abs(a-b), where a and b are 8-bit unsigned integers. The result of “a-b” requires a 9-bit signed integer (it can be anywhere from -255 to 255), so it can’t fit in 8-bit. But this is quite possible to do without unpacking : (satsub(a,b) | satsub(b,a)), where “satsub” performs a saturating subtract on the two values. If the value is positive, it yields the result ; if the value is negative, it yields zero. Oring the two together yields the desired result. This requires 4 ops on x86 ; unpacking would probably require at least 10, including the unpack and pack steps.
After the SIMD came optimizing the C code, which still took a significant portion of the total runtime. One of my biggest optimizations was adding aggressive “smart” prefetching to reduce cache misses. ffvp8 prefetches the reference frames (PREVIOUS, GOLDEN, and ALTREF)… but only the ones which have been used reasonably often this frame. This lets us prefetch everything we need without prefetching things that we probably won’t use. libvpx very often encodes frames that almost never (but not quite never) use GOLDEN or ALTREF, so this optimization greatly reduces time spent prefetching in a lot of real videos. There are of course countless other optimizations we made that are too long to list here as well, such as David’s entropy decoder optimizations. I’d also like to thank Eli Friedman for his invaluable help in benchmarking a lot of these changes.
What next ? Altivec (PPC) assembly is almost nonexistent, with the only functions being David’s motion compensation code. NEON (ARM) is completely nonexistent : we’ll need that to be fast on mobile devices as well. Of course, all this will come in due time — and as always — patches welcome !
Appendix : the raw numbers
Here’s the raw numbers (in fps) for the graphs at the start of this post, with standard error values :
Core i7 620QM (1.6Ghz), Windows 7, 32-bit :
Parkjoy ffvp8 : 44.58 0.44
Parkjoy libvpx : 33.06 0.23
Sintel ffvp8 : 74.26 1.18
Sintel libvpx : 56.11 0.96Core i5 520M (2.4Ghz), Linux, 64-bit :
Parkjoy ffvp8 : 68.29 0.06
Parkjoy libvpx : 41.06 0.04
Sintel ffvp8 : 112.38 0.37
Sintel libvpx : 69.64 0.09Core 2 T9300 (2.5Ghz), Mac OS X 10.6.4, 64-bit :
Parkjoy ffvp8 : 54.09 0.02
Parkjoy libvpx : 33.68 0.01
Sintel ffvp8 : 87.54 0.03
Sintel libvpx : 52.74 0.04Core Duo (2Ghz), Mac OS X 10.6.4, 32-bit :
Parkjoy ffvp8 : 21.31 0.02
Parkjoy libvpx : 17.96 0.00
Sintel ffvp8 : 41.24 0.01
Sintel libvpx : 29.65 0.02Atom N270 (1.6Ghz), Linux, 32-bit :
Parkjoy ffvp8 : 15.29 0.01
Parkjoy libvpx : 12.46 0.01
Sintel ffvp8 : 26.87 0.05
Sintel libvpx : 20.41 0.02 -
Announcing the world’s fastest VP8 decoder : ffvp8
Back when I originally reviewed VP8, I noted that the official decoder, libvpx, was rather slow. While there was no particular reason that it should be much faster than a good H.264 decoder, it shouldn’t have been that much slower either ! So, I set out with Ronald Bultje and David Conrad to make a better one in FFmpeg. This one would be community-developed and free from the beginning, rather than the proprietary code-dump that was libvpx. A few weeks ago the decoder was complete enough to be bit-exact with libvpx, making it the first independent free implementation of a VP8 decoder. Now, with the first round of optimizations complete, it should be ready for primetime. I’ll go into some detail about the development process, but first, let’s get to the real meat of this post : the benchmarks.
We tested on two 1080p clips : Parkjoy, a live-action 1080p clip, and the Sintel trailer, a CGI 1080p clip. Testing was done using “time ffmpeg -vcodec libvpx or vp8 -i input -vsync 0 -an -f null -”. We all used the latest SVN FFmpeg at the time of this posting ; the last revision optimizing the VP8 decoder was r24471.
As these benchmarks show, ffvp8 is clearly much faster than libvpx, particularly on 64-bit. It’s even faster by a large margin on Atom, despite the fact that we haven’t even begun optimizing for it. In many cases, ffvp8′s extra speed can make the difference between a video that plays and one that doesn’t, especially in modern browsers with software compositing engines taking up a lot of CPU time. Want to get faster playback of VP8 videos ? The next versions of FFmpeg-based players, like VLC, will include ffvp8. Want to get faster playback of WebM in your browser ? Lobby your browser developers to use ffvp8 instead of libvpx. I expect Chrome to switch first, as they already use libavcodec for most of their playback system.
Keep in mind ffvp8 is not “done” — we will continue to improve it and make it faster. We still have a number of optimizations in the pipeline that aren’t committed yet.
Developing ffvp8
The initial challenge, primarily pioneered by David and Ronald, was constructing the core decoder and making it bit-exact to libvpx. This was rather challenging, especially given the lack of a real spec. Many parts of the spec were outright misleading and contradicted libvpx itself. It didn’t help that the suite of official conformance tests didn’t even cover all the features used by the official encoder ! We’ve already started adding our own conformance tests to deal with this. But I’ve complained enough in past posts about the lack of a spec ; let’s get onto the gritty details.
The next step was adding SIMD assembly for all of the important DSP functions. VP8′s motion compensation and deblocking filter are by far the most CPU-intensive parts, much the same as in H.264. Unlike H.264, the deblocking filter relies on a lot of internal saturation steps, which are free in SIMD but costly in a normal C implementation, making the plain C code even slower. Of course, none of this is a particularly large problem ; any sane video decoder has all this stuff in SIMD.
I tutored Ronald in x86 SIMD and wrote most of the motion compensation, intra prediction, and some inverse transforms. Ronald wrote the rest of the inverse transforms and a bit of the motion compensation. He also did the most difficult part : the deblocking filter. Deblocking filters are always a bit difficult because every one is different. Motion compensation, by comparison, is usually very similar regardless of video format ; a 6-tap filter is a 6-tap filter, and most of the variation going on is just the choice of numbers to multiply by.
The biggest challenge in an SIMD deblocking filter is to avoid unpacking, that is, going from 8-bit to 16-bit. Many operations in deblocking filters would naively appear to require more than 8-bit precision. A simple example in the case of x86 is abs(a-b), where a and b are 8-bit unsigned integers. The result of “a-b” requires a 9-bit signed integer (it can be anywhere from -255 to 255), so it can’t fit in 8-bit. But this is quite possible to do without unpacking : (satsub(a,b) | satsub(b,a)), where “satsub” performs a saturating subtract on the two values. If the value is positive, it yields the result ; if the value is negative, it yields zero. Oring the two together yields the desired result. This requires 4 ops on x86 ; unpacking would probably require at least 10, including the unpack and pack steps.
After the SIMD came optimizing the C code, which still took a significant portion of the total runtime. One of my biggest optimizations was adding aggressive “smart” prefetching to reduce cache misses. ffvp8 prefetches the reference frames (PREVIOUS, GOLDEN, and ALTREF)… but only the ones which have been used reasonably often this frame. This lets us prefetch everything we need without prefetching things that we probably won’t use. libvpx very often encodes frames that almost never (but not quite never) use GOLDEN or ALTREF, so this optimization greatly reduces time spent prefetching in a lot of real videos. There are of course countless other optimizations we made that are too long to list here as well, such as David’s entropy decoder optimizations. I’d also like to thank Eli Friedman for his invaluable help in benchmarking a lot of these changes.
What next ? Altivec (PPC) assembly is almost nonexistent, with the only functions being David’s motion compensation code. NEON (ARM) is completely nonexistent : we’ll need that to be fast on mobile devices as well. Of course, all this will come in due time — and as always — patches welcome !
Appendix : the raw numbers
Here’s the raw numbers (in fps) for the graphs at the start of this post, with standard error values :
Core i7 620QM (1.6Ghz), Windows 7, 32-bit :
Parkjoy ffvp8 : 44.58 0.44
Parkjoy libvpx : 33.06 0.23
Sintel ffvp8 : 74.26 1.18
Sintel libvpx : 56.11 0.96Core i5 520M (2.4Ghz), Linux, 64-bit :
Parkjoy ffvp8 : 68.29 0.06
Parkjoy libvpx : 41.06 0.04
Sintel ffvp8 : 112.38 0.37
Sintel libvpx : 69.64 0.09Core 2 T9300 (2.5Ghz), Mac OS X 10.6.4, 64-bit :
Parkjoy ffvp8 : 54.09 0.02
Parkjoy libvpx : 33.68 0.01
Sintel ffvp8 : 87.54 0.03
Sintel libvpx : 52.74 0.04Core Duo (2Ghz), Mac OS X 10.6.4, 32-bit :
Parkjoy ffvp8 : 21.31 0.02
Parkjoy libvpx : 17.96 0.00
Sintel ffvp8 : 41.24 0.01
Sintel libvpx : 29.65 0.02Atom N270 (1.6Ghz), Linux, 32-bit :
Parkjoy ffvp8 : 15.29 0.01
Parkjoy libvpx : 12.46 0.01
Sintel ffvp8 : 26.87 0.05
Sintel libvpx : 20.41 0.02 -
Trouble building ffmpeg with cuda for windows 32 with visual studio 2022
27 juillet 2022, par manouHere is my building process


I open mingw32 from the x64 Native Tools Command Prompt for VS 2022


then in the mingw32 shell :


# cd /
# ./c/Program\ Files/Microsoft\ Visual\ Studio/2022/Community/VC/Auxiliary/Build/vcvars32.bat
# cd ~
# pacman -Sy diffutils git make gcc yasm pkg-config --noconfirm
# git clone --depth 1 https://git.ffmpeg.org/ffmpeg.git ffmpeg 
# git clone https://git.videolan.org/git/ffmpeg/nv-codec-headers.git nv-codec-headers 
# cd nv-codec-headers/
# make PREFIX=/usr/local
# make install PREFIX=/usr/local
# cd ..
# mkdir nv_sdk
# cp -r /c/Program\ Files/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/lib/Win32/* nv_sdk
# cp -r /c/Program\ Files/NVIDIA\ GPU\ Computing\ Toolkit/CUDA/v11.7/include/* nv_sdk
# export PATH="/c/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.32.31326/bin/Hostx86/x86/":$PATH
# export PATH="/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/bin/":$PATH
# ./configure --disable-everything --enable-decoder=h264 --enable-decoder=hevc --enable-cross-compile --disable-avdevice --disable-swresample --disable-postproc --disable-avfilter --target-os=mingw32 --enable-cuda-nvcc --enable-nonfree --toolchain=msvc --extra-cflags=-I../nv_sdk --extra-ldflags=" -m32 -L../nv_sdk" --enable-shared --shlibdir=SHARED_LIBS --arch=x86_32 --enable-runtime-cpudetect --enable-w32threads
# make -j8
# make install



first I have a bunch of warnings during the making looking like that :

libavutil/opt.c(1075): warning C4133: 'fonction' : types incompatibles - de 'AVPixelFormat *' à 'int *'


And finally the make install returns :


EXTERN_PREFIX="_" AR="lib.exe" NM="dumpbin.exe -symbols" ./compat/windows/makedef libavutil/libavutil.ver libavutil/adler32.o libavutil/aes.o libavutil/aes_ctr.o libavutil/audio_fifo.o libavutil/avsscanf.o libavutil/avstring.o libavutil/base64.o libavutil/blowfish.o libavutil/bprint.o libavutil/buffer.o libavutil/camellia.o libavutil/cast5.o libavutil/channel_layout.o libavutil/color_utils.o libavutil/cpu.o libavutil/crc.o libavutil/des.o libavutil/detection_bbox.o libavutil/dict.o libavutil/display.o libavutil/dovi_meta.o libavutil/downmix_info.o libavutil/encryption_info.o libavutil/error.o libavutil/eval.o libavutil/fifo.o libavutil/file.o libavutil/file_open.o libavutil/film_grain_params.o libavutil/fixed_dsp.o libavutil/float_dsp.o libavutil/frame.o libavutil/hash.o libavutil/hdr_dynamic_metadata.o libavutil/hdr_dynamic_vivid_metadata.o libavutil/hmac.o libavutil/hwcontext.o libavutil/hwcontext_d3d11va.o libavutil/hwcontext_dxva2.o libavutil/imgutils.o libavutil/integer.o libavutil/intmath.o libavutil/lfg.o libavutil/lls.o libavutil/log.o libavutil/log2_tab.o libavutil/lzo.o libavutil/mastering_display_metadata.o libavutil/mathematics.o libavutil/md5.o libavutil/mem.o libavutil/murmur3.o libavutil/opt.o libavutil/parseutils.o libavutil/pixdesc.o libavutil/pixelutils.o libavutil/random_seed.o libavutil/rational.o libavutil/rc4.o libavutil/reverse.o libavutil/ripemd.o libavutil/samplefmt.o libavutil/sha.o libavutil/sha512.o libavutil/slicethread.o libavutil/spherical.o libavutil/stereo3d.o libavutil/tea.o libavutil/threadmessage.o libavutil/time.o libavutil/timecode.o libavutil/tree.o libavutil/twofish.o libavutil/tx.o libavutil/tx_double.o libavutil/tx_float.o libavutil/tx_int32.o libavutil/utils.o libavutil/version.o libavutil/video_enc_params.o libavutil/x86/cpu.o libavutil/x86/cpuid.o libavutil/x86/fixed_dsp.o libavutil/x86/fixed_dsp_init.o libavutil/x86/float_dsp.o libavutil/x86/float_dsp_init.o libavutil/x86/imgutils.o libavutil/x86/imgutils_init.o libavutil/x86/lls.o libavutil/x86/lls_init.o libavutil/x86/tx_float.o libavutil/x86/tx_float_init.o libavutil/xga_font_data.o libavutil/xtea.o > libavutil/avutil-57.def
Could not create temporary library.
make: *** [ffbuild/library.mak:118: libavutil/avutil-57.dll] Error 1



What am I doing wrong ?
shall I install others packets from pacman ?