
Recherche avancée
Médias (16)
-
#7 Ambience
16 octobre 2011, par
Mis à jour : Juin 2015
Langue : English
Type : Audio
-
#6 Teaser Music
16 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#5 End Title
16 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#3 The Safest Place
16 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#4 Emo Creates
15 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#2 Typewriter Dance
15 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
Autres articles (77)
-
Les formats acceptés
28 janvier 2010, parLes commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
ffmpeg -codecs ffmpeg -formats
Les format videos acceptés en entrée
Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
Les formats vidéos de sortie possibles
Dans un premier temps on (...) -
Gestion des droits de création et d’édition des objets
8 février 2011, parPar défaut, beaucoup de fonctionnalités sont limitées aux administrateurs mais restent configurables indépendamment pour modifier leur statut minimal d’utilisation notamment : la rédaction de contenus sur le site modifiables dans la gestion des templates de formulaires ; l’ajout de notes aux articles ; l’ajout de légendes et d’annotations sur les images ;
-
Le profil des utilisateurs
12 avril 2011, parChaque utilisateur dispose d’une page de profil lui permettant de modifier ses informations personnelle. Dans le menu de haut de page par défaut, un élément de menu est automatiquement créé à l’initialisation de MediaSPIP, visible uniquement si le visiteur est identifié sur le site.
L’utilisateur a accès à la modification de profil depuis sa page auteur, un lien dans la navigation "Modifier votre profil" est (...)
Sur d’autres sites (7023)
-
Problems with frame rate on video conversion using ffmpeg with libx264 [migrated]
29 mai 2013, par Lars SchroeterI have problems with transcoding some videos. I ran the most simple ffmpeg command and it takes very long time and the output file is about 10 times bigger. If I provide the frame rate parameter -r it works well (small file, fast transcoding). What is the problem and how can I solve it ? I don't want to set a fixed frame rate because I guess it's better to leave it the same as source, isn't it ?.
Maybe the problem is something else, because I found many examples in web where the -r option isn't used. Also transcoding to a different format or with a different source works well without -r option (I tried with ffmpeg 0.7.15 and also 1.2.1). The videos are provided by the users of my website and automatically converted to be suitable for the web. So I need the most general command for automatic conversion.
In the following ffmpeg output you will find this two suspicious messages :
- Frame rate very high for a muxer not effciciently supporting it. Please consider specifiying a lower framerate, a different muxer or -vsync 2
- MB rate (36000000) > level limit (983040)
The ffmpeg command and output (without -r option) :
ffmpeg -i '/tmp/standort_aquarium.mp4' -vcodec libx264 output.mp4
ffmpeg version 0.7.15, Copyright (c) 2000-2013 the FFmpeg developers built on Feb 22 2013 07:18:58 with gcc 4.4.5 configuration : —enable-libdc1394 —prefix=/usr —extra-cflags='-Wall -g ' —cc='ccache cc' —enable-shared —enable-libmp3lame —enable-gpl —enable-libvorbis —enable-pthreads —enable-libfaac —enable-libxvid —enable-postproc —enable-x11grab —enable-libgsm —enable-libtheora —enable-libopencore-amrnb —enable-libopencore-amrwb —enable-libx264 —enable-libspeex —enable-nonfree —disable-stripping —enable-avfilter —enable-libdirac —disable-decoder=libdirac —enable-libfreetype —enable-libschroedinger —disable-encoder=libschroedinger —enable-version3 —enable-libopenjpeg —enable-libvpx —enable-librtmp —extra-libs=-lgcrypt —disable-altivec —disable-armv5te —disable-armv6 —disable-vis
libavutil 50. 43. 0 / 50. 43. 0
libavcodec 52.123. 0 / 52.123. 0
libavformat 52.111. 0 / 52.111. 0
libavdevice 52. 5. 0 / 52. 5. 0
libavfilter 1. 80. 0 / 1. 80. 0
libswscale 0. 14. 1 / 0. 14. 1
libpostproc 51. 2. 0 / 51. 2. 0
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/standort_aquarium.mp4' :
Metadata :
major_brand : mp42
minor_version : 0
compatible_brands : mp423gp4isom
creation_time : 2013-04-19 15:04:05
Duration : 00:00:18.24, start : 0.000000, bitrate : 2095 kb/s
Stream #0.0(und) : Video : mpeg4, yuv420p, 640x480 [PAR 1:1 DAR 4:3], 2001 kb/s, 14.97 fps, 30k tbr, 30k tbn, 30k tbc
Metadata :
creation_time : 2013-04-19 15:04:05
Stream #0.1(und) : Audio : aac, 48000 Hz, mono, s16, 96 kb/s
Metadata :
creation_time : 2013-04-19 15:04:05
File 'output.mp4' already exists. Overwrite ? [y/N] y
[mp4 @ 0x20eed80] Frame rate very high for a muxer not effciciently supporting it.
Please consider specifiying a lower framerate, a different muxer or -vsync 2
[buffer @ 0x20f8820] w:640 h:480 pixfmt:yuv420p tb:1/1000000 sar:1/1 sws_param :
[libx264 @ 0x20efde0] Default settings detected, using medium profile
[libx264 @ 0x20efde0] using SAR=1/1
[libx264 @ 0x20efde0] MB rate (36000000) > level limit (983040)
[libx264 @ 0x20efde0] using cpu capabilities : MMX2 SSE2Fast SSSE3 FastShuffle SSE4.2
[libx264 @ 0x20efde0] profile High, level 5.1
[libx264 @ 0x20efde0] 264 - core 118 - H.264/MPEG-4 AVC codec - Copyleft 2003-2011 - http://www.videolan.org/x264.html - options : cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4' :
Metadata :
major_brand : mp42
minor_version : 0
compatible_brands : mp423gp4isom
creation_time : 2013-04-19 15:04:05
encoder : Lavf52.111.0
Stream #0.0(und) : Video : libx264, yuv420p, 640x480 [PAR 1:1 DAR 4:3], q=2-31, 200 kb/s, 30k tbn, 30k tbc
Metadata :
creation_time : 2013-04-19 15:04:05
Stream #0.1(und) : Audio : libfaac, 48000 Hz, mono, s16, 64 kb/s
Metadata :
creation_time : 2013-04-19 15:04:05
Stream mapping :
Stream #0.0 -> #0.0
Stream #0.1 -> #0.1
Press [q] to stop, [?] for help
frame=542630 fps=132 q=33.0 Lsize= 77226kB time=00:00:18.08 bitrate=34976.2kbits/s dup=542358 drop=0
video:68604kB audio:143kB global headers:0kB muxing overhead 12.333275%
frame I:2174 Avg QP:18.72 size : 25040
[libx264 @ 0x20efde0] frame P:136846 Avg QP:25.27 size : 56
[libx264 @ 0x20efde0] frame B:403610 Avg QP:32.99 size : 20
[libx264 @ 0x20efde0] consecutive B-frames : 0.8% 0.0% 0.1% 99.1%
[libx264 @ 0x20efde0] mb I I16..4 : 5.5% 83.3% 11.1%
[libx264 @ 0x20efde0] mb P I16..4 : 0.0% 0.0% 0.0% P16..4 : 0.5% 0.0% 0.0% 0.0% 0.0% skip:99.4%
[libx264 @ 0x20efde0] mb B I16..4 : 0.0% 0.0% 0.0% B16..8 : 0.0% 0.0% 0.0% direct : 0.0% skip:100.0% L0:21.2% L1:78.8% BI : 0.0%
[libx264 @ 0x20efde0] 8x8 transform intra:83.1% inter:85.2%
[libx264 @ 0x20efde0] coded y,uvDC,uvAC intra : 91.2% 95.8% 80.7% inter : 0.0% 0.1% 0.0%
[libx264 @ 0x20efde0] i16 v,h,dc,p : 13% 40% 12% 35%
[libx264 @ 0x20efde0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu : 19% 34% 15% 4% 4% 5% 6% 7% 8%
[libx264 @ 0x20efde0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu : 20% 38% 6% 4% 6% 6% 8% 6% 6%
[libx264 @ 0x20efde0] i8c dc,h,v,p : 39% 32% 19% 10%
[libx264 @ 0x20efde0] Weighted P-Frames : Y:0.0% UV:0.0%
[libx264 @ 0x20efde0] ref P L0 : 91.5% 5.2% 2.8% 0.4% 0.0%
[libx264 @ 0x20efde0] ref B L0 : 55.7% 43.5% 0.8%
[libx264 @ 0x20efde0] ref B L1 : 97.9% 2.1%
[libx264 @ 0x20efde0] kb/s:31071.04The ffmpeg command and output with the -r 24 option :
ffmpeg -i '/tmp/standort_aquarium.mp4' -r 30000/1001 -vcodec libx264 output.mp4
ffmpeg version 0.7.15, Copyright (c) 2000-2013 the FFmpeg developers
built on Feb 22 2013 07:18:58 with gcc 4.4.5
configuration : —enable-libdc1394 —prefix=/usr —extra-cflags='-Wall -g ' —cc='ccache cc' —enable-shared —enable-libmp3lame —enable-gpl —enable-libvorbis —enable-pthreads —enable-libfaac —enable-libxvid —enable-postproc —enable-x11grab —enable-libgsm —enable-libtheora —enable-libopencore-amrnb —enable-libopencore-amrwb —enable-libx264 —enable-libspeex —enable-nonfree —disable-stripping —enable-avfilter —enable-libdirac —disable-decoder=libdirac —enable-libfreetype —enable-libschroedinger —disable-encoder=libschroedinger —enable-version3 —enable-libopenjpeg —enable-libvpx —enable-librtmp —extra-libs=-lgcrypt —disable-altivec —disable-armv5te —disable-armv6 —disable-vis
libavutil 50. 43. 0 / 50. 43. 0
libavcodec 52.123. 0 / 52.123. 0
libavformat 52.111. 0 / 52.111. 0
libavdevice 52. 5. 0 / 52. 5. 0
libavfilter 1. 80. 0 / 1. 80. 0
libswscale 0. 14. 1 / 0. 14. 1
libpostproc 51. 2. 0 / 51. 2. 0
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/tmp/standort_aquarium.mp4' :
Metadata :
major_brand : mp42
minor_version : 0
compatible_brands : mp423gp4isom
creation_time : 2013-04-19 15:04:05
Duration : 00:00:18.24, start : 0.000000, bitrate : 2095 kb/s
Stream #0.0(und) : Video : mpeg4, yuv420p, 640x480 [PAR 1:1 DAR 4:3], 2001 kb/s, 14.97 fps, 30k tbr, 30k tbn, 30k tbc
Metadata :
creation_time : 2013-04-19 15:04:05
Stream #0.1(und) : Audio : aac, 48000 Hz, mono, s16, 96 kb/s
Metadata :
creation_time : 2013-04-19 15:04:05
File 'output.mp4' already exists. Overwrite ? [y/N] y
[buffer @ 0x132e820] w:640 h:480 pixfmt:yuv420p tb:1/1000000 sar:1/1 sws_param :
[libx264 @ 0x1325de0] Default settings detected, using medium profile
[libx264 @ 0x1325de0] using SAR=1/1
[libx264 @ 0x1325de0] using cpu capabilities : MMX2 SSE2Fast SSSE3 FastShuffle SSE4.2
[libx264 @ 0x1325de0] profile High, level 3.0
[libx264 @ 0x1325de0] 264 - core 118 - H.264/MPEG-4 AVC codec - Copyleft 2003-2011 - http://www.videolan.org/x264.html - options : cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4' :
Metadata :
major_brand : mp42
minor_version : 0
compatible_brands : mp423gp4isom
creation_time : 2013-04-19 15:04:05
encoder : Lavf52.111.0
Stream #0.0(und) : Video : libx264, yuv420p, 640x480 [PAR 1:1 DAR 4:3], q=2-31, 200 kb/s, 30k tbn, 29.97 tbc
Metadata :
creation_time : 2013-04-19 15:04:05
Stream #0.1(und) : Audio : libfaac, 48000 Hz, mono, s16, 64 kb/s
Metadata :
creation_time : 2013-04-19 15:04:05
Stream mapping :
Stream #0.0 -> #0.0
Stream #0.1 -> #0.1
Press [q] to stop, [?] for help
frame= 542 fps= 36 q=29.0 Lsize= 2059kB time=00:00:18.01 bitrate= 936.3kbits/s dup=270 drop=0
video:1904kB audio:143kB global headers:0kB muxing overhead 0.609224%
frame I:3 Avg QP:22.39 size : 14773
[libx264 @ 0x1325de0] frame P:514 Avg QP:23.98 size : 3675
[libx264 @ 0x1325de0] frame B:25 Avg QP:27.44 size : 643
[libx264 @ 0x1325de0] consecutive B-frames : 93.7% 0.0% 1.1% 5.2%
[libx264 @ 0x1325de0] mb I I16..4 : 16.4% 78.3% 5.3%
[libx264 @ 0x1325de0] mb P I16..4 : 1.6% 6.3% 0.3% P16..4 : 30.8% 8.6% 3.1% 0.0% 0.0% skip:49.4%
[libx264 @ 0x1325de0] mb B I16..4 : 0.4% 0.7% 0.0% B16..8 : 13.2% 1.6% 0.2% direct : 0.3% skip:83.6% L0:50.0% L1:47.1% BI : 2.9%
[libx264 @ 0x1325de0] 8x8 transform intra:77.1% inter:83.1%
[libx264 @ 0x1325de0] coded y,uvDC,uvAC intra : 62.0% 76.4% 24.4% inter : 17.9% 26.3% 2.3%
[libx264 @ 0x1325de0] i16 v,h,dc,p : 14% 60% 13% 13%
[libx264 @ 0x1325de0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu : 15% 35% 33% 2% 3% 3% 3% 3% 4%
[libx264 @ 0x1325de0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu : 15% 40% 12% 4% 7% 7% 7% 5% 4%
[libx264 @ 0x1325de0] i8c dc,h,v,p : 46% 34% 16% 4%
[libx264 @ 0x1325de0] Weighted P-Frames : Y:8.0% UV:4.5%
[libx264 @ 0x1325de0] ref P L0 : 65.6% 16.7% 8.8% 7.9% 0.9%
[libx264 @ 0x1325de0] ref B L0 : 85.9% 13.3% 0.8%
[libx264 @ 0x1325de0] ref B L1 : 88.7% 11.3%
[libx264 @ 0x1325de0] kb/s:862.28The video source is temporarily available under : https://www.dropbox.com/s/4xg147z77u40g87/standort_aquarium.mp4
-
Hung out to dry
31 mai 2013, par Mans — Law and libertyOutrage was the general reaction when Google recently announced their dropping of XMPP server-to-server federation from Hangouts, as the search giant’s revamped instant messaging platform is henceforth to be known. This outrage is, however, largely unjustified ; Google’s decision is merely a rational response to issues of a more fundamental nature. To see why, we need to step back and look at the broader instant messaging landscape.
A brief history of IM
The term instant messaging (IM) gained popularity in the mid-1990s along with the rise of chat clients such as ICQ, AOL Instant Messenger, and later MSN Messenger. These all had one thing in common : they were closed systems. Although global in the sense of allowing access from anywhere on the Internet, communication was possible only within each network, and only using the officially sanctioned client software. Contrast this with email, where users are free to choose any service provider as well as client software, inter-server communication over open protocols delivering messages to their proper destinations.
The email picture has, however, not always been so rosy. During the 1970s and 80s a multitude of incompatible email systems (e.g. UUCP and X.400) were in more or less widespread use on various networks. As these networks gave way to the ARPANET/Internet, so did their mail systems to the SMTP email we all use today. A similar consolidation has yet to occur in the area of instant messaging.
Over the years, a few efforts towards a cross-domain instant messaging have been undertaken. One early example is the Zephyr system created as part of Project Athena at MIT in the late 1980s. While it never saw significant uptake, it is still in use at a few universities. A more successful story is that of XMPP. Conceived under the name Jabber in the late 1990s, XMPP is an open standard specified in a set of IETF RFCs. In addition to being open, a distinguishing feature of XMPP compared to other contemporary IM systems is its decentralised nature, server-to-server connections allowing communication between users with accounts on different systems. Just like email.
The social network
A more recent emergence on the Internet is the social network. Although not the first of its kind, Facebook was the first to achieve its level of penetration, both geographically and across social groups. A range of messaging options, including email-style as well as instant messaging (chat), are available, all within the same web interface. What it does not allow is communication outside the Facebook network. Other social networks operate in the same spirit.
The popularity of social networks, to the extent that they for many constitute the primary means of communication, has in a sense brought back fragmented networks of the 1980s. Even though they share infrastructure, up to and including the browser application, the social networks create walled-off regions of the Internet between which little or no exchange is possible.
The house that Google built
In 2005, Google launched Talk, an XMPP-based instant messaging service allowing users to connect using either Google’s official client application or any third-party XMPP client. Soon after, server-to-server federation was activated, enabling anyone with a Google account to exchange instant messages with users of any other federated XMPP service. An in-browser chat interface was also added to Gmail.
It was arguably only with the 2011 introduction of Google+ that Google, despite its previous endeavours with Orkut and Buzz, had a viable contender in the social networking space. Since its inception, Google+ has gone through a number of changes where features have been added or reworked. Instant messaging within Google+ was until recently available only in mobile clients. On the desktop, the sole messaging option was Hangouts which, although featuring text chat, cannot be considered instant messaging in the usual sense.
With a sprawling collection of messaging systems (Talk, Google+ Messenger, Hangouts), some action to consolidate them was a logical step. What we got was a unification under the Hangouts name. A redesigned Google+ now sports in-browser instant messaging similar the the Talk interface already present in Gmail. At the same time, the standalone desktop Talk client is discontinued, as is the Messenger feature in mobile Google+. All together, the changes make for a much less confusing user experience.
The sky is falling down
Along with the changes to the messaging platform, one announcement stoked anger on the Internet : Google’s intent to discontinue XMPP federation (as of this writing, it is still operational). Google, the (self-described) champions of openness on the Internet were seen to be closing their doors to the outside world. The effects of the change are, however, not quite so earth-shattering. Of the other major messaging networks to offer XMPP at all (Facebook, Skype, and the defunct Microsoft Messenger), none support federation ; a Google user has never been able to chat with a Facebook user.
XMPP federation appears to be in use mainly by non-profit organisations or individuals running their own servers. The number of users on these systems is hard to assess, though it seems fair to assume it is dwarfed by the hundreds of millions using Google or Facebook. As such, the overall impact of cutting off communication with the federated servers is relatively minor, albeit annoying for those affected.
A fragmented world
Rather than chastising Google for making a low-impact, presumably founded, business decision, we should be asking ourselves why instant messaging is still so fragmented in the first place, whereas email is not. The answer can be found by examining the nature of entities providing these services.
Ever since the commercialisation of the Internet started in the 1990s, email has been largely seen as being part of the Internet. Access to email was a major selling point for Internet service providers ; indeed, many still use the email facilities of their ISP. Instant messaging, by contrast, has never come as part of the basic offering, rather being a third-party service running on top of the Internet.
Users wishing to engage in instant messaging have always had to seek out and sign up with a provider of such a service. As the IM networks were isolated, most would choose whichever service their friends were already using, and a small number of networks, each with a sustainable number of users, came to dominate. In the early days, dedicated IM services such as ICQ were popular. Today, social networks have taken their place with Facebook currently in the dominant position. With the new Hangouts, Google offers its users the service they want in the way they have come to expect.
Follow the money
We now have all the pieces necessary to see why inter-domain instant messaging has never taken off, and the answer is simple : the major players have no commercial incentive to open access to their IM networks. In fact, they have good reason to keep the networks closed. Ensuring that a person leaving the network loses contact with his or her friends, increases user retention by raising the cost of switching to another service. Monetising users is also better facilitated if they are forced to remain on, say, Facebook’s web pages while using its services rather than accessing them indirectly, perhaps even through a competing (Google, say) frontend. The users do not generally care much, since all their friends are already on the same network as themselves.
While Google Talk was a standalone service, only loosely coupled to other Google products, these aspects were of lesser importance. After all, Google still had access to all the messages passing through the system and could analyse them for advert targeting purposes. Now that messaging is an integrated part of Google+, and thus serves as a direct competitor to the likes of Facebook, the situation has changed. All the reasons for Facebook not to open its network now apply equally to Google as well.
-
Developing A Shader-Based Video Codec
22 juin 2013, par Multimedia Mike — Outlandish BrainstormsEarly last month, this thing called ORBX.js was in the news. It ostensibly has something to do with streaming video and codec technology, which naturally catches my interest. The hype was kicked off by Mozilla honcho Brendan Eich when he posted an article asserting that HD video decoding could be entirely performed in JavaScript. We’ve seen this kind of thing before using Broadway– an H.264 decoder implemented entirely in JS. But that exposes some very obvious limitations (notably CPU usage).
But this new video codec promises 1080p HD playback directly in JavaScript which is a lofty claim. How could it possibly do this ? I got the impression that performance was achieved using WebGL, an extension which allows JavaScript access to accelerated 3D graphics hardware. Browsing through the conversations surrounding the ORBX.js announcement, I found this confirmation from Eich himself :
You’re right that WebGL does heavy lifting.
As of this writing, ORBX.js remains some kind of private tech demo. If there were a public demo available, it would necessarily be easy to reverse engineer the downloadable JavaScript decoder.
But the announcement was enough to make me wonder how it could be possible to create a video codec which effectively leverages 3D hardware.
Prior Art
In theorizing about this, it continually occurs to me that I can’t possibly be the first person to attempt to do this (or the ORBX.js people, for that matter). In googling on the matter, I found various forums and Q&A posts where people asked if it were possible to, e.g., accelerate JPEG decoding and presentation using 3D hardware, with no answers. I also found a blog post which describes a plan to use 3D hardware to accelerate VP8 video decoding. It was a project done under the banner of Google’s Summer of Code in 2011, though I’m not sure which open source group mentored the effort. The project did not end up producing the shader-based VP8 codec originally chartered but mentions that “The ‘client side’ of the VP8 VDPAU implementation is working and is currently being reviewed by the libvdpau maintainers.” I’m not sure what that means. Perhaps it includes modifications to the public API that supports VP8, but is waiting for the underlying hardware to actually implement VP8 decoding blocks in hardware.What’s So Hard About This ?
Video decoding is a computationally intensive task. GPUs are known to be really awesome at chewing through computationally intensive tasks. So why aren’t GPUs a natural fit for decoding video codecs ?Generally, it boils down to parallelism, or lack of opportunities thereof. GPUs are really good at doing the exact same operations over lots of data at once. The problem is that decoding compressed video usually requires multiple phases that cannot be parallelized, and the individual phases often cannot be parallelized. In strictly mathematical terms, a compressed data stream will need to be decoded by applying a function f(x) over each data element, x0 .. xn. However, the function relies on having applied the function to the previous data element, i.e. :
f(xn) = f(f(xn-1))
What happens when you try to parallelize such an algorithm ? Temporal rifts in the space/time continuum, if you’re in a Star Trek episode. If you’re in the real world, you’ll get incorrect, unusuable data as the parallel computation is seeded with a bunch of invalid data at multiple points (which is illustrated in some of the pictures in the aforementioned blog post about accelerated VP8).
Example : JPEG
Let’s take a very general look at the various stages involved in decoding the ubiquitous JPEG format :
What are the opportunities to parallelize these various phases ?
- Huffman decoding (run length decoding and zig-zag reordering is assumed to be rolled into this phase) : not many opportunities for parallelizing the various Huffman formats out there, including this one. Decoding most Huffman streams is necessarily a sequential operation. I once hypothesized that it would be possible to engineer a codec to achieve some parallelism during the entropy decoding phase, and later found that On2′s VP8 codec employs the scheme. However, such a scheme is unlikely to break down to such a fine level that WebGL would require.
- Reverse DC prediction : JPEG — and many other codecs — doesn’t store full DC coefficients. It stores differences in successive DC coefficients. Reversing this process can’t be parallelized. See the discussion in the previous section.
- Dequantize coefficients : This could be very parallelized. It should be noted that software decoders often don’t dequantize all coefficients. Many coefficients are 0 and it’s a waste of a multiplication operation to dequantize. Thus, this phase is sometimes rolled into the Huffman decoding phase.
- Invert discrete cosine transform : This seems like it could be highly parallelizable. I will be exploring this further in this post.
- Convert YUV -> RGB for final display : This is a well-established use case for 3D acceleration.
Crash Course in 3D Shaders and Humility
So I wanted to see if I could accelerate some parts of JPEG decoding using something called shaders. I made an effort to understand 3D programming and its associated math throughout the 1990s but 3D technology left me behind a very long time ago while I got mixed up in this multimedia stuff. So I plowed through a few books concerning WebGL (thanks to my new Safari Books Online subscription). After I learned enough about WebGL/JS to be dangerous and just enough about shader programming to be absolutely lethal, I set out to try my hand at optimizing IDCT using shaders.Here’s my extremely high level (and probably hopelessly naive) view of the modern GPU shader programming model :
The WebGL program written in JavaScript drives the show. It sends a set of vertices into the WebGL system and each vertex is processed through a vertex shader. Then, each pixel that falls within a set of vertices is sent through a fragment shader to compute the final pixel attributes (R, G, B, and alpha value). Another consideration is textures : This is data that the program uploads to GPU memory which can be accessed programmatically by the shaders).
These shaders (vertex and fragment) are key to the GPU’s programmability. How are they programmed ? Using a special C-like shading language. Thought I : “C-like language ? I know C ! I should be able to master this in short order !” So I charged forward with my assumptions and proceeded to get smacked down repeatedly by the overall programming paradigm. I came to recognize this as a variation of the scientific method : Develop a hypothesis– in my case, a mental model of how the system works ; develop an experiment (short program) to prove or disprove the model ; realize something fundamental that I was overlooking ; formulate new hypothesis and repeat.
First Approach : Vertex Workhorse
My first pitch goes like this :- Upload DCT coefficients to GPU memory in the form of textures
- Program a vertex mesh that encapsulates 16×16 macroblocks
- Distribute the IDCT effort among multiple vertex shaders
- Pass transformed Y, U, and V blocks to fragment shader which will convert the samples to RGB
So the idea is that decoding of 16×16 macroblocks is parallelized. A macroblock embodies 6 blocks :
It would be nice to process one of these 6 blocks in each vertex. But that means drawing a square with 6 vertices. How do you do that ? I eventually realized that drawing a square with 6 vertices is the recommended method for drawing a square on 3D hardware. Using 2 triangles, each with 3 vertices (0, 1, 2 ; 3, 4, 5) :
A vertex shader knows which (x, y) coordinates it has been assigned, so it could figure out which sections of coefficients it needs to access within the textures. But how would a vertex shader know which of the 6 blocks it should process ? Solution : Misappropriate the vertex’s z coordinate. It’s not used for anything else in this case.
So I set all of that up. Then I hit a new roadblock : How to get the reconstructed Y, U, and V samples transported to the fragment shader ? I have found that communicating between shaders is quite difficult. Texture memory ? WebGL doesn’t allow shaders to write back to texture memory ; shaders can only read it. The standard way to communicate data from a vertex shader to a fragment shader is to declare variables as “varying”. Up until this point, I knew about varying variables but there was something I didn’t quite understand about them and it nagged at me : If 3 different executions of a vertex shader set 3 different values to a varying variable, what value is passed to the fragment shader ?
It turns out that the varying variable varies, which means that the GPU passes interpolated values to each fragment shader invocation. This completely destroys this idea.
Second Idea : Vertex Workhorse, Take 2
The revised pitch is to work around the interpolation issue by just having each vertex shader invocation performs all 6 block transforms. That seems like a lot of redundant. However, I figured out that I can draw a square with only 4 vertices by arranging them in an ‘N’ pattern and asking WebGL to draw a TRIANGLE_STRIP instead of TRIANGLES. Now it’s only doing the 4x the extra work, and not 6x. GPUs are supposed to be great at this type of work, so it shouldn’t matter, right ?I wired up an experiment and then ran into a new problem : While I was able to transform a block (or at least pretend to), and load up a varying array (that wouldn’t vary since all vertex shaders wrote the same values) to transmit to the fragment shader, the fragment shader can’t access specific values within the varying block. To clarify, a WebGL shader can use a constant value — or a value that can be evaluated as a constant at compile time — to index into arrays ; a WebGL shader can not compute an index into an array. Per my reading, this is a WebGL security consideration and the limitation may not be present in other OpenGL(-ES) implementations.
Not Giving Up Yet : Choking The Fragment Shader
You might want to be sitting down for this pitch :- Vertex shader only interpolates texture coordinates to transmit to fragment shader
- Fragment shader performs IDCT for a single Y sample, U sample, and V sample
- Fragment shader converts YUV -> RGB
Seems straightforward enough. However, that step concerning IDCT for Y, U, and V entails a gargantuan number of operations. When computing the IDCT for an entire block of samples, it’s possible to leverage a lot of redundancy in the math which equates to far fewer overall operations. If you absolutely have to compute each sample individually, for an 8×8 block, that requires 64 multiplication/accumulation (MAC) operations per sample. For 3 color planes, and including a few extra multiplications involved in the RGB conversion, that tallies up to about 200 MACs per pixel. Then there’s the fact that this approach means a 4x redundant operations on the color planes.
It’s crazy, but I just want to see if it can be done. My approach is to pre-compute a pile of IDCT constants in the JavaScript and transmit them to the fragment shader via uniform variables. For a first order optimization, the IDCT constants are formatted as 4-element vectors. This allows computing 16 dot products rather than 64 individual multiplication/addition operations. Ideally, GPU hardware executes the dot products faster (and there is also the possibility of lining these calculations up as matrices).
I can report that I actually got a sample correctly transformed using this approach. Just one sample, through. Then I ran into some new problems :
Problem #1 : Computing sample #1 vs. sample #0 requires a different table of 64 IDCT constants. Okay, so create a long table of 64 * 64 IDCT constants. However, this suffers from the same problem as seen in the previous approach : I can’t dynamically compute the index into this array. What’s the alternative ? Maintain 64 separate named arrays and implement 64 branches, when branching of any kind is ill-advised in shader programming to begin with ? I started to go down this path until I ran into…
Problem #2 : Shaders can only be so large. 64 * 64 floats (4 bytes each) requires 16 kbytes of data and this well exceeds the amount of shader storage that I can assume is allowed. That brings this path of exploration to a screeching halt.
Further Brainstorming
I suppose I could forgo pre-computing the constants and directly compute the IDCT for each sample which would entail lots more multiplications as well as 128 cosine calculations per sample (384 considering all 3 color planes). I’m a little stuck with the transform idea right now. Maybe there are some other transforms I could try.Another idea would be vector quantization. What little ORBX.js literature is available indicates that there is a method to allow real-time streaming but that it requires GPU assistance to yield enough horsepower to make it feasible. When I think of such severe asymmetry between compression and decompression, my mind drifts towards VQ algorithms. As I come to understand the benefits and limitations of GPU acceleration, I think I can envision a way that something similar to SVQ1, with its copious, hierarchical vector tables stored as textures, could be implemented using shaders.
So far, this all pertains to intra-coded video frames. What about opportunities for inter-coded frames ? The only approach that I can envision here is to use WebGL’s readPixels() function to fetch the rasterized frame out of the GPU, and then upload it again as a new texture which a new frame processing pipeline could reference. Whether this idea is plausible would require some profiling.
Using interframes in such a manner seems to imply that the entire codec would need to operate in RGB space and not YUV.
Conclusions
The people behind ORBX.js have apparently figured out a way to create a shader-based video codec. I have yet to even begin to reason out a plausible approach. However, I’m glad I did this exercise since I have finally broken through my ignorance regarding modern GPU shader programming. It’s nice to have a topic like multimedia that allows me a jumping-off point to explore other areas.