
Recherche avancée
Médias (39)
-
Stereo master soundtrack
17 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Audio
-
ED-ME-5 1-DVD
11 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Audio
-
1,000,000
27 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Demon Seed
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
The Four of Us are Dying
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
-
Corona Radiata
26 septembre 2011, par
Mis à jour : Septembre 2011
Langue : English
Type : Audio
Autres articles (74)
-
Websites made with MediaSPIP
2 mai 2011, parThis page lists some websites based on MediaSPIP.
-
Gestion des droits de création et d’édition des objets
8 février 2011, parPar défaut, beaucoup de fonctionnalités sont limitées aux administrateurs mais restent configurables indépendamment pour modifier leur statut minimal d’utilisation notamment : la rédaction de contenus sur le site modifiables dans la gestion des templates de formulaires ; l’ajout de notes aux articles ; l’ajout de légendes et d’annotations sur les images ;
-
Dépôt de média et thèmes par FTP
31 mai 2013, parL’outil MédiaSPIP traite aussi les média transférés par la voie FTP. Si vous préférez déposer par cette voie, récupérez les identifiants d’accès vers votre site MédiaSPIP et utilisez votre client FTP favori.
Vous trouverez dès le départ les dossiers suivants dans votre espace FTP : config/ : dossier de configuration du site IMG/ : dossier des média déjà traités et en ligne sur le site local/ : répertoire cache du site web themes/ : les thèmes ou les feuilles de style personnalisées tmp/ : dossier de travail (...)
Sur d’autres sites (10101)
-
FFmpeg C++ api decode h264 error
29 mai 2015, par armsI’m trying to use the C++ API of FFMpeg (version 20150526) under Windows using the prebuilt binaries to decode an h264 video file (*.ts).
I’ve written a very simple code that automatically detects the required codec from the file itself (and it is AV_CODEC_ID_H264, as expected).
Then I re-open the video file in read-binary mode and I read a fixed-size buffer of bytes from it and provide the read bytes to the decoder within a while-loop until the end of file. However when I call the function avcodec_decode_video2 a large amount of errors happen like the following ones :
[h264 @ 008df020] top block unavailable for requested intro mode at 34 0
[h264 @ 008df020] error while decoding MB 34 0, bytestream 3152
[h264 @ 008df020] decode_slice_header error
Sometimes the function avcodec_decode_video2 sets the value of got_picture_ptr to 1 and hence I expect to find a good frame. Instead, though all the computations are successful, when I view the decoded frame (using OpenCV only for visualization purposes) I see a gray one with some artifacts.
If I employ the same code to decode an *.avi file it works fine.
Reading the examples of FFMpeg I did not find a solution to my problem. I’ve also implemented the solution proposed in the simlar question FFmpeg c++ H264 decoding error but it did not work.
Does anyone know where the error is ?
Thank you in advance for any reply !
The code is the following [EDIT : code updated including the parser management] :
#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
#include <opencv2></opencv2>opencv.hpp>
#ifdef __cplusplus
extern "C"
{
#endif // __cplusplus
#include <libavcodec></libavcodec>avcodec.h>
#include <libavdevice></libavdevice>avdevice.h>
#include <libavfilter></libavfilter>avfilter.h>
#include <libavformat></libavformat>avformat.h>
#include <libavformat></libavformat>avio.h>
#include <libavutil></libavutil>avutil.h>
#include <libpostproc></libpostproc>postprocess.h>
#include <libswresample></libswresample>swresample.h>
#include <libswscale></libswscale>swscale.h>
#ifdef __cplusplus
} // end extern "C".
#endif // __cplusplus
#define INBUF_SIZE 4096
void main()
{
AVCodec* l_pCodec;
AVCodecContext* l_pAVCodecContext;
SwsContext* l_pSWSContext;
AVFormatContext* l_pAVFormatContext;
AVFrame* l_pAVFrame;
AVFrame* l_pAVFrameBGR;
AVPacket l_AVPacket;
AVPacket l_AVPacket_out;
AVStream* l_pStream;
AVCodecParserContext* l_pParser;
FILE* l_pFile_in;
FILE* l_pFile_out;
std::string l_sFile;
int l_iResult;
int l_iFrameCount;
int l_iGotFrame;
int l_iBufLength;
int l_iParsedBytes;
int l_iPts;
int l_iDts;
int l_iPos;
int l_iSize;
int l_iDecodedBytes;
uint8_t l_auiInBuf[INBUF_SIZE + FF_INPUT_BUFFER_PADDING_SIZE];
uint8_t* l_pData;
cv::Mat l_cvmImage;
l_pCodec = NULL;
l_pAVCodecContext = NULL;
l_pSWSContext = NULL;
l_pAVFormatContext = NULL;
l_pAVFrame = NULL;
l_pAVFrameBGR = NULL;
l_pParser = NULL;
l_pStream = NULL;
l_pFile_in = NULL;
l_pFile_out = NULL;
l_iPts = 0;
l_iDts = 0;
l_iPos = 0;
l_pData = NULL;
l_sFile = "myvideo.ts";
avdevice_register_all();
avfilter_register_all();
avcodec_register_all();
av_register_all();
avformat_network_init();
l_pAVFormatContext = avformat_alloc_context();
l_iResult = avformat_open_input(&l_pAVFormatContext,
l_sFile.c_str(),
NULL,
NULL);
if (l_iResult >= 0)
{
l_iResult = avformat_find_stream_info(l_pAVFormatContext, NULL);
if (l_iResult >= 0)
{
for (int i=0; inb_streams; i++)
{
if (l_pAVFormatContext->streams[i]->codec->codec_type ==
AVMEDIA_TYPE_VIDEO)
{
l_pCodec = avcodec_find_decoder(
l_pAVFormatContext->streams[i]->codec->codec_id);
l_pStream = l_pAVFormatContext->streams[i];
}
}
}
}
av_init_packet(&l_AVPacket);
av_init_packet(&l_AVPacket_out);
memset(l_auiInBuf + INBUF_SIZE, 0, FF_INPUT_BUFFER_PADDING_SIZE);
if (l_pCodec)
{
l_pAVCodecContext = avcodec_alloc_context3(l_pCodec);
l_pParser = av_parser_init(l_pAVCodecContext->codec_id);
if (l_pParser)
{
av_register_codec_parser(l_pParser->parser);
}
if (l_pAVCodecContext)
{
if (l_pCodec->capabilities & CODEC_CAP_TRUNCATED)
{
l_pAVCodecContext->flags |= CODEC_FLAG_TRUNCATED;
}
l_iResult = avcodec_open2(l_pAVCodecContext, l_pCodec, NULL);
if (l_iResult >= 0)
{
l_pFile_in = fopen(l_sFile.c_str(), "rb");
if (l_pFile_in)
{
l_pAVFrame = av_frame_alloc();
l_pAVFrameBGR = av_frame_alloc();
if (l_pAVFrame)
{
l_iFrameCount = 0;
avcodec_get_frame_defaults(l_pAVFrame);
while (1)
{
l_iBufLength = fread(l_auiInBuf,
1,
INBUF_SIZE,
l_pFile_in);
if (l_iBufLength == 0)
{
break;
}
else
{
l_pData = l_auiInBuf;
l_iSize = l_iBufLength;
while (l_iSize > 0)
{
if (l_pParser)
{
l_iParsedBytes = av_parser_parse2(
l_pParser,
l_pAVCodecContext,
&l_AVPacket_out.data,
&l_AVPacket_out.size,
l_pData,
l_iSize,
l_AVPacket.pts,
l_AVPacket.dts,
AV_NOPTS_VALUE);
if (l_iParsedBytes <= 0)
{
break;
}
l_AVPacket.pts = l_AVPacket.dts = AV_NOPTS_VALUE;
l_AVPacket.pos = -1;
}
else
{
l_AVPacket_out.data = l_pData;
l_AVPacket_out.size = l_iSize;
}
l_iDecodedBytes =
avcodec_decode_video2(
l_pAVCodecContext,
l_pAVFrame,
&l_iGotFrame,
&l_AVPacket_out);
if (l_iDecodedBytes >= 0)
{
if (l_iGotFrame)
{
l_pSWSContext = sws_getContext(
l_pAVCodecContext->width,
l_pAVCodecContext->height,
l_pAVCodecContext->pix_fmt,
l_pAVCodecContext->width,
l_pAVCodecContext->height,
AV_PIX_FMT_BGR24,
SWS_BICUBIC,
NULL,
NULL,
NULL);
if (l_pSWSContext)
{
l_iResult = avpicture_alloc(
reinterpret_cast(l_pAVFrameBGR),
AV_PIX_FMT_BGR24,
l_pAVFrame->width,
l_pAVFrame->height);
l_iResult = sws_scale(
l_pSWSContext,
l_pAVFrame->data,
l_pAVFrame->linesize,
0,
l_pAVCodecContext->height,
l_pAVFrameBGR->data,
l_pAVFrameBGR->linesize);
if (l_iResult > 0)
{
l_cvmImage = cv::Mat(
l_pAVFrame->height,
l_pAVFrame->width,
CV_8UC3,
l_pAVFrameBGR->data[0],
l_pAVFrameBGR->linesize[0]);
if (l_cvmImage.empty() == false)
{
cv::imshow("image", l_cvmImage);
cv::waitKey(10);
}
}
}
l_iFrameCount++;
}
}
else
{
break;
}
l_pData += l_iParsedBytes;
l_iSize -= l_iParsedBytes;
}
}
} // end while(1).
}
fclose(l_pFile_in);
}
}
}
}
}
</sstream></string></iomanip></iostream>
EDIT : The following is the final code that solves my problem, thanks to the suggestions of Ronald.
#include <iostream>
#include <iomanip>
#include <string>
#include <sstream>
#include <opencv2></opencv2>opencv.hpp>
#ifdef __cplusplus
extern "C"
{
#endif // __cplusplus
#include <libavcodec></libavcodec>avcodec.h>
#include <libavdevice></libavdevice>avdevice.h>
#include <libavfilter></libavfilter>avfilter.h>
#include <libavformat></libavformat>avformat.h>
#include <libavformat></libavformat>avio.h>
#include <libavutil></libavutil>avutil.h>
#include <libpostproc></libpostproc>postprocess.h>
#include <libswresample></libswresample>swresample.h>
#include <libswscale></libswscale>swscale.h>
#ifdef __cplusplus
} // end extern "C".
#endif // __cplusplus
void main()
{
AVCodec* l_pCodec;
AVCodecContext* l_pAVCodecContext;
SwsContext* l_pSWSContext;
AVFormatContext* l_pAVFormatContext;
AVFrame* l_pAVFrame;
AVFrame* l_pAVFrameBGR;
AVPacket l_AVPacket;
std::string l_sFile;
uint8_t* l_puiBuffer;
int l_iResult;
int l_iFrameCount;
int l_iGotFrame;
int l_iDecodedBytes;
int l_iVideoStreamIdx;
int l_iNumBytes;
cv::Mat l_cvmImage;
l_pCodec = NULL;
l_pAVCodecContext = NULL;
l_pSWSContext = NULL;
l_pAVFormatContext = NULL;
l_pAVFrame = NULL;
l_pAVFrameBGR = NULL;
l_puiBuffer = NULL;
l_sFile = "myvideo.ts";
av_register_all();
l_iResult = avformat_open_input(&l_pAVFormatContext,
l_sFile.c_str(),
NULL,
NULL);
if (l_iResult >= 0)
{
l_iResult = avformat_find_stream_info(l_pAVFormatContext, NULL);
if (l_iResult >= 0)
{
for (int i=0; inb_streams; i++)
{
if (l_pAVFormatContext->streams[i]->codec->codec_type ==
AVMEDIA_TYPE_VIDEO)
{
l_iVideoStreamIdx = i;
l_pAVCodecContext =
l_pAVFormatContext->streams[l_iVideoStreamIdx]->codec;
if (l_pAVCodecContext)
{
l_pCodec = avcodec_find_decoder(l_pAVCodecContext->codec_id);
}
break;
}
}
}
}
if (l_pCodec && l_pAVCodecContext)
{
l_iResult = avcodec_open2(l_pAVCodecContext, l_pCodec, NULL);
if (l_iResult >= 0)
{
l_pAVFrame = av_frame_alloc();
l_pAVFrameBGR = av_frame_alloc();
l_iNumBytes = avpicture_get_size(PIX_FMT_BGR24,
l_pAVCodecContext->width,
l_pAVCodecContext->height);
l_puiBuffer = (uint8_t *)av_malloc(l_iNumBytes*sizeof(uint8_t));
avpicture_fill((AVPicture *)l_pAVFrameBGR,
l_puiBuffer,
PIX_FMT_RGB24,
l_pAVCodecContext->width,
l_pAVCodecContext->height);
l_pSWSContext = sws_getContext(
l_pAVCodecContext->width,
l_pAVCodecContext->height,
l_pAVCodecContext->pix_fmt,
l_pAVCodecContext->width,
l_pAVCodecContext->height,
AV_PIX_FMT_BGR24,
SWS_BICUBIC,
NULL,
NULL,
NULL);
while (av_read_frame(l_pAVFormatContext, &l_AVPacket) >= 0)
{
if (l_AVPacket.stream_index == l_iVideoStreamIdx)
{
l_iDecodedBytes = avcodec_decode_video2(
l_pAVCodecContext,
l_pAVFrame,
&l_iGotFrame,
&l_AVPacket);
if (l_iGotFrame)
{
if (l_pSWSContext)
{
l_iResult = sws_scale(
l_pSWSContext,
l_pAVFrame->data,
l_pAVFrame->linesize,
0,
l_pAVCodecContext->height,
l_pAVFrameBGR->data,
l_pAVFrameBGR->linesize);
if (l_iResult > 0)
{
l_cvmImage = cv::Mat(
l_pAVFrame->height,
l_pAVFrame->width,
CV_8UC3,
l_pAVFrameBGR->data[0],
l_pAVFrameBGR->linesize[0]);
if (l_cvmImage.empty() == false)
{
cv::imshow("image", l_cvmImage);
cv::waitKey(1);
}
}
}
l_iFrameCount++;
}
}
}
}
}
}
</sstream></string></iomanip></iostream> -
Reverse Engineering Italian Literature
1er juillet 2014, par Multimedia Mike — Reverse EngineeringSome time ago, Diego “Flameeyes” Pettenò tried his hand at reverse engineering a set of really old CD-ROMs containing even older Italian literature. The goal of this RE endeavor would be to extract the useful literature along with any structural metadata (chapters, etc.) and convert it to a more open format suitable for publication at, e.g., Project Gutenberg or Archive.org.
Unfortunately, the structure of the data thwarted the more simplistic analysis attempts (like inspecting for blocks of textual data). This will require deeper RE techniques. Further frustrating the effort, however, is the fact that the binaries that implement the reading program are written for the now-archaic Windows 3.1 operating system.
In pursuit of this RE goal, I recently thought of a way to glean more intelligence using DOSBox.
Prior Work
There are 6 discs in the full set (distributed along with 6 sequential issues of a print magazine named L’Espresso). Analysis of the contents of the various discs reveals that many of the files are the same on each disc. It was straightforward to identify the set of files which are unique on each disc. This set of files all end with the extension “LZn”, where n = 1..6 depending on the disc number. Further, the root directory of each disc has a file indicating the sequence number (1..6) of the CD. Obviously, these are the interesting targets.The LZ file extensions stand out to an individual skilled in the art of compression– could it be a variation of the venerable LZ compression ? That’s actually unlikely because LZ — also seen as LIZ — stands for Letteratura Italiana Zanichelli (Zanichelli’s Italian Literature).
The Unix ‘file’ command was of limited utility, unable to plausibly identify any of the files.
Progress was stalled.
Saying Hello To An Old Frenemy
I have been showing this screenshot to younger coworkers to see if any of them recognize it :
Not a single one has seen it before. Senior computer citizen status : Confirmed.
I recently watched an Ancient DOS Games video about Windows 3.1 games. This episode showed Windows 3.1 running under DOSBox. I had heard this was possible but that it took a little work to get running. I had a hunch that someone else had probably already done the hard stuff so I took to the BitTorrent networks and quickly found a download that had the goods ready to go– a directory of Windows 3.1 files that just had to be dropped into a DOSBox directory and they would be ready to run.
Aside : Running OS software procured from a BitTorrent network ? Isn’t that an insane security nightmare ? I’m not too worried since it effectively runs under a sandboxed virtual machine, courtesy of DOSBox. I suppose there’s the risk of trojan’d OS software infecting binaries that eventually leave the sandbox.
Using DOSBox Like ‘strace’
strace is a tool available on some Unix systems, including Linux, which is able to monitor the system calls that a program makes. In reverse engineering contexts, it can be useful to monitor an opaque, binary program to see the names of the files it opens and how many bytes it reads, and from which locations. I have written examples of this before (wow, almost 10 years ago to the day ; now I feel old for the second time in this post).Here’s the pitch : Make DOSBox perform as strace in order to serve as a platform for reverse engineering Windows 3.1 applications. I formed a mental model about how DOSBox operates — abstracted file system classes with methods for opening and reading files — and then jumped into the source code. Sure enough, the code was exactly as I suspected and a few strategic print statements gave me the data I was looking for.
Eventually, I even took to running DOSBox under the GNU Debugger (GDB). This hasn’t proven especially useful yet, but it has led to an absurd level of nesting :
The target application runs under Windows 3.1, which is running under DOSBox, which is running under GDB. This led to a crazy situation in which DOSBox had the mouse focus when a GDB breakpoint was triggered. At this point, DOSBox had all desktop input focus and couldn’t surrender it because it wasn’t running. I had no way to interact with the Linux desktop and had to reboot the computer. The next time, I took care to only use the keyboard to navigate the application and trigger the breakpoint and not allow DOSBox to consume the mouse focus.
New Intelligence
By instrumenting the local file class (virtual HD files) and the ISO file class (CD-ROM files), I was able to watch which programs and dynamic libraries are loaded and which data files the code cares about. I was able to narrow down the fact that the most interesting programs are called LEGGENDO.EXE (‘reading’) and LEGGENDA.EXE (‘legend’ ; this has been a great Italian lesson as well as RE puzzle). The first calls the latter, which displays this view of the data we are trying to get at :
When first run, the program takes an interest in a file called DBBIBLIO (‘database library’, I suspect) :
=== Read(’LIZ98\DBBIBLIO.LZ1’) : req 337 bytes ; read 337 bytes from pos 0x0 === Read(’LIZ98\DBBIBLIO.LZ1’) : req 337 bytes ; read 337 bytes from pos 0x151 === Read(’LIZ98\DBBIBLIO.LZ1’) : req 337 bytes ; read 337 bytes from pos 0x2A2 [...]
While we were unable to sort out all of the data files in our cursory investigation, a few things were obvious. The structure of this file looked to contain 336-byte records. Turns out I was off by 1– the records are actually 337 bytes each. The count of records read from disc is equal to the number of items shown in the UI.
Next, the program is interested in a few more files :
*** isoFile() : ’DEPOSITO\BLOKCTC.LZ1’, offset 0x27D6000, 2911488 bytes large === Read(’DEPOSITO\BLOKCTC.LZ1’) : req 96 bytes ; read 96 bytes from pos 0x0 *** isoFile() : ’DEPOSITO\BLOKCTX0.LZ1’, offset 0x2A9D000, 17152 bytes large === Read(’DEPOSITO\BLOKCTX0.LZ1’) : req 128 bytes ; read 128 bytes from pos 0x0 === Seek(’DEPOSITO\BLOKCTX0.LZ1’) : seek 384 (0x180) bytes, type 0 === Read(’DEPOSITO\BLOKCTX0.LZ1’) : req 256 bytes ; read 256 bytes from pos 0x180 === Seek(’DEPOSITO\BLOKCTC.LZ1’) : seek 1152 (0x480) bytes, type 0 === Read(’DEPOSITO\BLOKCTC.LZ1’) : req 32 bytes ; read 32 bytes from pos 0x480 === Read(’DEPOSITO\BLOKCTC.LZ1’) : req 1504 bytes ; read 1504 bytes from pos 0x4A0 [...]
Eventually, it becomes obvious that BLOKCTC has the juicy meat. There are 32-byte records followed by variable-length encoded text sections. Since there is no text to be found in these files, the text is either compressed, encrypted, or both. Some rough counting (the program seems to disable copy/paste, which thwarts more precise counting), indicates that the text size is larger than the data chunks being read from disc, so compression seems likely. Encryption isn’t out of the question (especially since the program deems it necessary to disable copy and pasting of this public domain literary data), and if it’s in use, that means the key is being read from one of these files.
Blocked On Disassembly
So I’m a bit blocked right now. I know exactly where the data lives, but it’s clear that I need to reverse engineer some binary code. The big problem is that I have no idea how to disassemble Windows 3.1 binaries. These are NE-type executable files. Disassemblers abound for MZ files (MS-DOS executables) and PE files (executables for Windows 95 and beyond). NE files get no respect. It’s difficult (but not impossible) to even find data about the format anymore, and details are incomplete. It should be noted, however, the DOSBox-as-strace method described here lends insight into how Windows 3.1 processes NE-type EXEs. You can’t get any more authoritative than that.So far, I have tried the freeware version of IDA Pro. Unfortunately, I haven’t been able to get the program to work on my Windows machine for a long time. Even if I could, I can’t find any evidence that it actually supports NE files (the free version specifically mentions MZ and PE, but does not mention NE or LE).
I found an old copy of Borland’s beloved Turbo Assembler and Debugger package. It has Turbo Debugger for Windows, both regular and 32-bit versions. Unfortunately, the normal version just hangs Windows 3.1 in DOSBox. The 32-bit Turbo Debugger loads just fine but can’t load the NE file.
I’ve also wondered if DOSBox contains any advanced features for trapping program execution and disassembling. I haven’t looked too deeply into this yet.
Future Work
NE files seem to be the executable format that time forgot. I have a crazy brainstorm about repacking NE files as MZ executables so that they could be taken apart with an MZ disassembler. But this will take some experimenting.If anyone else has any ideas about ripping open these binaries, I would appreciate hearing them.
And I guess I shouldn’t be too surprised to learn that all the literature in this corpus is already freely available and easily downloadable anyway. But you shouldn’t be too surprised if that doesn’t discourage me from trying to crack the format that’s keeping this particular copy of the data locked up.
-
Open Media Developers Track at OVC 2011
11 octobre 2011, par silviaThe Open Video Conference that took place on 10-12 September was so overwhelming, I’ve still not been able to catch my breath ! It was a dense three days for me, even though I only focused on the technology sessions of the conference and utterly missed out on all the policy and content discussions.
Roughly 60 people participated in the Open Media Software (OMS) developers track. This was an amazing group of people capable and willing to shape the future of video technology on the Web :
- HTML5 video developers from Apple, Google, Opera, and Mozilla (though we missed the NZ folks),
- codec developers from WebM, Xiph, and MPEG,
- Web video developers from YouTube, JWPlayer, Kaltura, VideoJS, PopcornJS, etc.,
- content publishers from Wikipedia, Internet Archive, YouTube, Netflix, etc.,
- open source tool developers from FFmpeg, gstreamer, flumotion, VideoLAN, PiTiVi, etc,
- and many more.
To provide a summary of all the discussions would be impossible, so I just want to share the key take-aways that I had from the main sessions.
WebRTC : Realtime Communications and HTML5
Tim Terriberry (Mozilla), Serge Lachapelle (Google) and Ethan Hugg (CISCO) moderated this session together (slides). There are activities both at the W3C and at IETF – the ones at IETF are supposed to focus on protocols, while the W3C ones on HTML5 extensions.
The current proposal of a PeerConnection API has been implemented in WebKit/Chrome as open source. It is expected that Firefox will have an add-on by Q1 next year. It enables video conferencing, including media capture, media encoding, signal processing (echo cancellation etc), secure transmission, and a data stream exchange.
Current discussions are around the signalling protocol and whether SIP needs to be required by the standard. Further, the codec question is under discussion with a question whether to mandate VP8 and Opus, since transcoding gateways are not desirable. Another question is how to measure the quality of the connection and how to report errors so as to allow adaptation.
What always amazes me around RTC is the sheer number of specialised protocols that seem to be required to implement this. WebRTC does not disappoint : in fact, the question was asked whether there could be a lighter alternative than to re-use dozens of years of protocol development – is it over-engineered ? Can desktop players connect to a WebRTC session ?
We are already in a second or third revision of this part of the HTML5 specification and yet it seems the requirements are still being collected. I’m quietly confident that everything is done to make the lives of the Web developer easier, but it sure looks like a huge task.
The Missing Link : Flash to HTML5
Zohar Babin (Kaltura) and myself moderated this session and I must admit that this session was the biggest eye-opener for me amongst all the sessions. There was a large number of Flash developers present in the room and that was great, because sometimes we just don’t listen enough to lessons learnt in the past.
This session gave me one of those aha-moments : it the form of the Flash appendBytes() API function.
The appendBytes() function allows a Flash developer to take a byteArray out of a connected video resource and do something with it – such as feed it to a video for display. When I heard that Web developers want that functionality for JavaScript and the video element, too, I instinctively rejected the idea wondering why on earth would a Web developer want to touch encoded video bytes – why not leave that to the browser.
But as it turns out, this is actually a really powerful enabler of functionality. For example, you can use it to :
- display mid-roll video ads as part of the same video element,
- sequence playlists of videos into the same video element,
- implement DVR functionality (high-speed seeking),
- do mash-ups,
- do video editing,
- adaptive streaming.
This totally blew my mind and I am now completely supportive of having such a function in HTML5. Together with media fragment URIs you could even leave all the header download management for resources to the Web browser and just request time ranges from a video through an appendBytes() function. This would be easier on the Web developer than having to deal with byte ranges and making sure that appropriate decoding pipelines are set up.
Standards for Video Accessibility
Philip Jagenstedt (Opera) and myself moderated this session. We focused on the HTML5 track element and the WebVTT file format. Many issues were identified that will still require work.
One particular topic was to find a standard means of rendering the UI for caption, subtitle, und description selection. For example, what icons should be used to indicate that subtitles or captions are available. While this is not part of the HTML5 specification, it’s still important to get this right across browsers since otherwise users will get confused with diverging interfaces.
Chaptering was discussed and a particular need to allow URLs to directly point at chapters was expressed. I suggested the use of named Media Fragment URLs.
The use of WebVTT for descriptions for the blind was also discussed. A suggestion was made to use the voice tag <v> to allow for “styling” (i.e. selection) of the screen reader voice.
Finally, multitrack audio or video resources were also discussed and the @mediagroup attribute was explained. A question about how to identify the language used in different alternative dubs was asked. This is an issue because @srclang is not on audio or video, only on text, so it’s a missing feature for the multitrack API.
Beyond this session, there was also a breakout session on WebVTT and the track element. As a consequence, a number of bugs were registered in the W3C bug tracker.
WebM : Testing, Metrics and New features
This session was moderated by John Luther and John Koleszar, both of the WebM Project. They started off with a presentation on current work on WebM, which includes quality testing and improvements, and encoder speed improvement. Then they moved on to questions about how to involve the community more.
The community criticised that communication of what is happening around WebM is very scarce. More sharing of information was requested, including a move to using open Google+ hangouts instead of Google internal video conferences. More use of the public bug tracker can also help include the community better.
Another pain point of the community was that code is introduced and removed without much feedback. It was requested to introduce a peer review process. Also it was requested that example code snippets are published when new features are announced so others can replicate the claims.
This all indicates to me that the WebM project is increasingly more open, but that there is still a lot to learn.
Standards for HTTP Adaptive Streaming
This session was moderated by Frank Galligan and Aaron Colwell (Google), and Mark Watson (Netflix).
Mark started off by giving us an introduction to MPEG DASH, the MPEG file format for HTTP adaptive streaming. MPEG has just finalized the format and he was able to show us some examples. DASH is XML-based and thus rather verbose. It is covering all eventualities of what parameters could be switched during transmissions, which makes it very broad. These include trick modes e.g. for fast forwarding, 3D, multi-view and multitrack content.
MPEG have defined profiles – one for live streaming which requires chunking of the files on the server, and one for on-demand which requires keyframe alignment of the files. There are clear specifications for how to do these with MPEG. Such profiles would need to be created for WebM and Ogg Theora, too, to make DASH universally applicable.
Further, the Web case needs a more restrictive adaptation approach, since the video element’s API is already accounting for some of the features that DASH provides for desktop applications. So, a Web-specific profile of DASH would be required.
Then Aaron introduced us to the MediaSource API and in particular the webkitSourceAppend() extension that he has been experimenting with. It is essentially an implementation of the appendBytes() function of Flash, which the Web developers had been asking for just a few sessions earlier. This was likely the biggest announcement of OVC, alas a quiet and technically-focused one.
Aaron explained that he had been trying to find a way to implement HTTP adaptive streaming into WebKit in a way in which it could be standardised. While doing so, he also came across other requirements around such chunked video handling, in particular around dynamic ad insertion, live streaming, DVR functionality (fast forward), constraint video editing, and mashups. While trying to sort out all these requirements, it became clear that it would be very difficult to implement strategies for stream switching, buffering and delivery of video chunks into the browser when so many different and likely contradictory requirements exist. Also, once an approach is implemented and specified for the browser, it becomes very difficult to innovate on it.
Instead, the easiest way to solve it right now and learn about what would be necessary to implement into the browser would be to actually allow Web developers to queue up a chunk of encoded video into a video element for decoding and display. Thus, the webkitSourceAppend() function was born (specification).
The proposed extension to the HTMLMediaElement is as follows :
partial interface HTMLMediaElement // URL passed to src attribute to enable the media source logic. readonly attribute [URL] DOMString webkitMediaSourceURL ;
bool webkitSourceAppend(in Uint8Array data) ;
// end of stream status codes.
const unsigned short EOS_NO_ERROR = 0 ;
const unsigned short EOS_NETWORK_ERR = 1 ;
const unsigned short EOS_DECODE_ERR = 2 ;void webkitSourceEndOfStream(in unsigned short status) ;
// states
const unsigned short SOURCE_CLOSED = 0 ;
const unsigned short SOURCE_OPEN = 1 ;
const unsigned short SOURCE_ENDED = 2 ;readonly attribute unsigned short webkitSourceState ;
;The code is already checked into WebKit, but commented out behind a command-line compiler flag.
Frank then stepped forward to show how webkitSourceAppend() can be used to implement HTTP adaptive streaming. His example uses WebM – there are no examples with MPEG or Ogg yet.
The chunks that Frank’s demo used were 150 video frames long (6.25s) and 5s long audio. Stream switching only switched video, since audio data is much lower bandwidth and more important to retain at high quality. Switching was done on multiplexed files.
Every chunk requires an XHR range request – this could be optimised if the connections were kept open per adaptation. Seeking works, too, but since decoding requires download of a whole chunk, seeking latency is determined by the time it takes to download and decode that chunk.
Similar to DASH, when using this approach for live streaming, the server has to produce one file per chunk, since byte range requests are not possible on a continuously growing file.
Frank did not use DASH as the manifest format for his HTTP adaptive streaming demo, but instead used a hacked-up custom XML format. It would be possible to use JSON or any other format, too.
After this session, I was actually completely blown away by the possibilities that such a simple API extension allows. If I wasn’t sold on the idea of a appendBytes() function in the earlier session, this one completely changed my mind. While I still believe we need to standardise a HTTP adaptive streaming file format that all browsers will support for all codecs, and I still believe that a native implementation for support of such a file format is necessary, I also believe that this approach of webkitSourceAppend() is what HTML needs – and maybe it needs it faster than native HTTP adaptive streaming support.
Standards for Browser Video Playback Metrics
This session was moderated by Zachary Ozer and Pablo Schklowsky (JWPlayer). Their motivation for the topic was, in fact, also HTTP adaptive streaming. Once you leave the decisions about when to do stream switching to JavaScript (through a function such a wekitSourceAppend()), you have to expose stream metrics to the JS developer so they can make informed decisions. The other use cases is, of course, monitoring of the quality of video delivery for reporting to the provider, who may then decide to change their delivery environment.
The discussion found that we really care about metrics on three different levels :
- measuring the network performance (bandwidth)
- measuring the decoding pipeline performance
- measuring the display quality
In the end, it seemed that work previously done by Steve Lacey on a proposal for video metrics was generally acceptable, except for the playbackJitter metric, which may be too aggregate to mean much.
Device Inputs / A/V in the Browser
I didn’t actually attend this session held by Anant Narayanan (Mozilla), but from what I heard, the discussion focused on how to manage permission of access to video camera, microphone and screen, e.g. when multiple applications (tabs) want access or when the same site wants access in a different session. This may apply to real-time communication with screen sharing, but also to photo sharing, video upload, or canvas access to devices e.g. for time lapse photography.
Open Video Editors
This was another session that I wasn’t able to attend, but I believe the creation of good open source video editing software and similar video creation software is really crucial to giving video a broader user appeal.
Jeff Fortin (PiTiVi) moderated this session and I was fascinated to later see his analysis of the lifecycle of open source video editors. It is shocking to see how many people/projects have tried to create an open source video editor and how many have stopped their project. It is likely that the creation of a video editor is such a complex challenge that it requires a larger and more committed open source project – single people will just run out of steam too quickly. This may be comparable to the creation of a Web browser (see the size of the Mozilla project) or a text processing system (see the size of the OpenOffice project).
Jeff also mentioned the need to create open video editor standards around playlist file formats etc. Possibly the Open Video Alliance could help. In any case, something has to be done in this space – maybe this would be a good topic to focus next year’s OVC on ?
Monday’s Breakout Groups
The conference ended officially on Sunday night, but we had a third day of discussions / hackday at the wonderful New York Lawschool venue. We had collected issues of interest during the two previous days and organised the breakout groups on the morning (Schedule).
In the Content Protection/DRM session, Mark Watson from Netflix explained how their API works and that they believe that all we need in browsers is a secure way to exchange keys and an indicator of protection scheme is used – the actual protection scheme would not be implemented by the browser, but be provided by the underlying system (media framework/operating system). I think that until somebody actually implements something in a browser fork and shows how this can be done, we won’t have much progress. In my understanding, we may also need to disable part of the video API for encrypted content, because otherwise you can always e.g. grab frames from the video element into canvas and save them from there.
In the Playlists and Gapless Playback session, there was massive brainstorming about what new cool things can be done with the video element in browsers if playback between snippets can be made seamless. Further discussions were about a standard playlist file formats (such as XSPF, MRSS or M3U), media fragment URIs in playlists for mashups, and the need to expose track metadata for HTML5 media elements.
What more can I say ? It was an amazing three days and the complexity of problems that we’re dealing with is a tribute to how far HTML5 and open video has already come and exciting news for the kind of applications that will be possible (both professional and community) once we’ve solved the problems of today. It will be exciting to see what progress we will have made by next year’s conference.
Thanks go to Google for sponsoring my trip to OVC.
UPDATE : We actually have a mailing list for open media developers who are interested in these and similar topics – do join at http://lists.annodex.net/cgi-bin/mailman/listinfo/foms.