Recherche avancée

Médias (1)

Mot : - Tags -/Christian Nold

Autres articles (97)

  • MediaSPIP 0.1 Beta version

    25 avril 2011, par

    MediaSPIP 0.1 beta is the first version of MediaSPIP proclaimed as "usable".
    The zip file provided here only contains the sources of MediaSPIP in its standalone version.
    To get a working installation, you must manually install all-software dependencies on the server.
    If you want to use this archive for an installation in "farm mode", you will also need to proceed to other manual (...)

  • ANNEXE : Les plugins utilisés spécifiquement pour la ferme

    5 mars 2010, par

    Le site central/maître de la ferme a besoin d’utiliser plusieurs plugins supplémentaires vis à vis des canaux pour son bon fonctionnement. le plugin Gestion de la mutualisation ; le plugin inscription3 pour gérer les inscriptions et les demandes de création d’instance de mutualisation dès l’inscription des utilisateurs ; le plugin verifier qui fournit une API de vérification des champs (utilisé par inscription3) ; le plugin champs extras v2 nécessité par inscription3 (...)

  • Publier sur MédiaSpip

    13 juin 2013

    Puis-je poster des contenus à partir d’une tablette Ipad ?
    Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir

Sur d’autres sites (10309)

  • Developing A Shader-Based Video Codec

    22 juin 2013, par Multimedia Mike — Outlandish Brainstorms

    Early last month, this thing called ORBX.js was in the news. It ostensibly has something to do with streaming video and codec technology, which naturally catches my interest. The hype was kicked off by Mozilla honcho Brendan Eich when he posted an article asserting that HD video decoding could be entirely performed in JavaScript. We’ve seen this kind of thing before using Broadway– an H.264 decoder implemented entirely in JS. But that exposes some very obvious limitations (notably CPU usage).

    But this new video codec promises 1080p HD playback directly in JavaScript which is a lofty claim. How could it possibly do this ? I got the impression that performance was achieved using WebGL, an extension which allows JavaScript access to accelerated 3D graphics hardware. Browsing through the conversations surrounding the ORBX.js announcement, I found this confirmation from Eich himself :

    You’re right that WebGL does heavy lifting.

    As of this writing, ORBX.js remains some kind of private tech demo. If there were a public demo available, it would necessarily be easy to reverse engineer the downloadable JavaScript decoder.

    But the announcement was enough to make me wonder how it could be possible to create a video codec which effectively leverages 3D hardware.

    Prior Art
    In theorizing about this, it continually occurs to me that I can’t possibly be the first person to attempt to do this (or the ORBX.js people, for that matter). In googling on the matter, I found various forums and Q&A posts where people asked if it were possible to, e.g., accelerate JPEG decoding and presentation using 3D hardware, with no answers. I also found a blog post which describes a plan to use 3D hardware to accelerate VP8 video decoding. It was a project done under the banner of Google’s Summer of Code in 2011, though I’m not sure which open source group mentored the effort. The project did not end up producing the shader-based VP8 codec originally chartered but mentions that “The ‘client side’ of the VP8 VDPAU implementation is working and is currently being reviewed by the libvdpau maintainers.” I’m not sure what that means. Perhaps it includes modifications to the public API that supports VP8, but is waiting for the underlying hardware to actually implement VP8 decoding blocks in hardware.

    What’s So Hard About This ?
    Video decoding is a computationally intensive task. GPUs are known to be really awesome at chewing through computationally intensive tasks. So why aren’t GPUs a natural fit for decoding video codecs ?

    Generally, it boils down to parallelism, or lack of opportunities thereof. GPUs are really good at doing the exact same operations over lots of data at once. The problem is that decoding compressed video usually requires multiple phases that cannot be parallelized, and the individual phases often cannot be parallelized. In strictly mathematical terms, a compressed data stream will need to be decoded by applying a function f(x) over each data element, x0 .. xn. However, the function relies on having applied the function to the previous data element, i.e. :

    f(xn) = f(f(xn-1))
    

    What happens when you try to parallelize such an algorithm ? Temporal rifts in the space/time continuum, if you’re in a Star Trek episode. If you’re in the real world, you’ll get incorrect, unusuable data as the parallel computation is seeded with a bunch of invalid data at multiple points (which is illustrated in some of the pictures in the aforementioned blog post about accelerated VP8).

    Example : JPEG
    Let’s take a very general look at the various stages involved in decoding the ubiquitous JPEG format :


    High level JPEG decoding flow

    What are the opportunities to parallelize these various phases ?

    • Huffman decoding (run length decoding and zig-zag reordering is assumed to be rolled into this phase) : not many opportunities for parallelizing the various Huffman formats out there, including this one. Decoding most Huffman streams is necessarily a sequential operation. I once hypothesized that it would be possible to engineer a codec to achieve some parallelism during the entropy decoding phase, and later found that On2′s VP8 codec employs the scheme. However, such a scheme is unlikely to break down to such a fine level that WebGL would require.
    • Reverse DC prediction : JPEG — and many other codecs — doesn’t store full DC coefficients. It stores differences in successive DC coefficients. Reversing this process can’t be parallelized. See the discussion in the previous section.
    • Dequantize coefficients : This could be very parallelized. It should be noted that software decoders often don’t dequantize all coefficients. Many coefficients are 0 and it’s a waste of a multiplication operation to dequantize. Thus, this phase is sometimes rolled into the Huffman decoding phase.
    • Invert discrete cosine transform : This seems like it could be highly parallelizable. I will be exploring this further in this post.
    • Convert YUV -> RGB for final display : This is a well-established use case for 3D acceleration.

    Crash Course in 3D Shaders and Humility
    So I wanted to see if I could accelerate some parts of JPEG decoding using something called shaders. I made an effort to understand 3D programming and its associated math throughout the 1990s but 3D technology left me behind a very long time ago while I got mixed up in this multimedia stuff. So I plowed through a few books concerning WebGL (thanks to my new Safari Books Online subscription). After I learned enough about WebGL/JS to be dangerous and just enough about shader programming to be absolutely lethal, I set out to try my hand at optimizing IDCT using shaders.

    Here’s my extremely high level (and probably hopelessly naive) view of the modern GPU shader programming model :


    Basic WebGL rendering pipeline

    The WebGL program written in JavaScript drives the show. It sends a set of vertices into the WebGL system and each vertex is processed through a vertex shader. Then, each pixel that falls within a set of vertices is sent through a fragment shader to compute the final pixel attributes (R, G, B, and alpha value). Another consideration is textures : This is data that the program uploads to GPU memory which can be accessed programmatically by the shaders).

    These shaders (vertex and fragment) are key to the GPU’s programmability. How are they programmed ? Using a special C-like shading language. Thought I : “C-like language ? I know C ! I should be able to master this in short order !” So I charged forward with my assumptions and proceeded to get smacked down repeatedly by the overall programming paradigm. I came to recognize this as a variation of the scientific method : Develop a hypothesis– in my case, a mental model of how the system works ; develop an experiment (short program) to prove or disprove the model ; realize something fundamental that I was overlooking ; formulate new hypothesis and repeat.

    First Approach : Vertex Workhorse
    My first pitch goes like this :

    • Upload DCT coefficients to GPU memory in the form of textures
    • Program a vertex mesh that encapsulates 16×16 macroblocks
    • Distribute the IDCT effort among multiple vertex shaders
    • Pass transformed Y, U, and V blocks to fragment shader which will convert the samples to RGB

    So the idea is that decoding of 16×16 macroblocks is parallelized. A macroblock embodies 6 blocks :


    JPEG macroblocks

    It would be nice to process one of these 6 blocks in each vertex. But that means drawing a square with 6 vertices. How do you do that ? I eventually realized that drawing a square with 6 vertices is the recommended method for drawing a square on 3D hardware. Using 2 triangles, each with 3 vertices (0, 1, 2 ; 3, 4, 5) :


    2 triangles make a square

    A vertex shader knows which (x, y) coordinates it has been assigned, so it could figure out which sections of coefficients it needs to access within the textures. But how would a vertex shader know which of the 6 blocks it should process ? Solution : Misappropriate the vertex’s z coordinate. It’s not used for anything else in this case.

    So I set all of that up. Then I hit a new roadblock : How to get the reconstructed Y, U, and V samples transported to the fragment shader ? I have found that communicating between shaders is quite difficult. Texture memory ? WebGL doesn’t allow shaders to write back to texture memory ; shaders can only read it. The standard way to communicate data from a vertex shader to a fragment shader is to declare variables as “varying”. Up until this point, I knew about varying variables but there was something I didn’t quite understand about them and it nagged at me : If 3 different executions of a vertex shader set 3 different values to a varying variable, what value is passed to the fragment shader ?

    It turns out that the varying variable varies, which means that the GPU passes interpolated values to each fragment shader invocation. This completely destroys this idea.

    Second Idea : Vertex Workhorse, Take 2
    The revised pitch is to work around the interpolation issue by just having each vertex shader invocation performs all 6 block transforms. That seems like a lot of redundant. However, I figured out that I can draw a square with only 4 vertices by arranging them in an ‘N’ pattern and asking WebGL to draw a TRIANGLE_STRIP instead of TRIANGLES. Now it’s only doing the 4x the extra work, and not 6x. GPUs are supposed to be great at this type of work, so it shouldn’t matter, right ?

    I wired up an experiment and then ran into a new problem : While I was able to transform a block (or at least pretend to), and load up a varying array (that wouldn’t vary since all vertex shaders wrote the same values) to transmit to the fragment shader, the fragment shader can’t access specific values within the varying block. To clarify, a WebGL shader can use a constant value — or a value that can be evaluated as a constant at compile time — to index into arrays ; a WebGL shader can not compute an index into an array. Per my reading, this is a WebGL security consideration and the limitation may not be present in other OpenGL(-ES) implementations.

    Not Giving Up Yet : Choking The Fragment Shader
    You might want to be sitting down for this pitch :

    • Vertex shader only interpolates texture coordinates to transmit to fragment shader
    • Fragment shader performs IDCT for a single Y sample, U sample, and V sample
    • Fragment shader converts YUV -> RGB

    Seems straightforward enough. However, that step concerning IDCT for Y, U, and V entails a gargantuan number of operations. When computing the IDCT for an entire block of samples, it’s possible to leverage a lot of redundancy in the math which equates to far fewer overall operations. If you absolutely have to compute each sample individually, for an 8×8 block, that requires 64 multiplication/accumulation (MAC) operations per sample. For 3 color planes, and including a few extra multiplications involved in the RGB conversion, that tallies up to about 200 MACs per pixel. Then there’s the fact that this approach means a 4x redundant operations on the color planes.

    It’s crazy, but I just want to see if it can be done. My approach is to pre-compute a pile of IDCT constants in the JavaScript and transmit them to the fragment shader via uniform variables. For a first order optimization, the IDCT constants are formatted as 4-element vectors. This allows computing 16 dot products rather than 64 individual multiplication/addition operations. Ideally, GPU hardware executes the dot products faster (and there is also the possibility of lining these calculations up as matrices).

    I can report that I actually got a sample correctly transformed using this approach. Just one sample, through. Then I ran into some new problems :

    Problem #1 : Computing sample #1 vs. sample #0 requires a different table of 64 IDCT constants. Okay, so create a long table of 64 * 64 IDCT constants. However, this suffers from the same problem as seen in the previous approach : I can’t dynamically compute the index into this array. What’s the alternative ? Maintain 64 separate named arrays and implement 64 branches, when branching of any kind is ill-advised in shader programming to begin with ? I started to go down this path until I ran into…

    Problem #2 : Shaders can only be so large. 64 * 64 floats (4 bytes each) requires 16 kbytes of data and this well exceeds the amount of shader storage that I can assume is allowed. That brings this path of exploration to a screeching halt.

    Further Brainstorming
    I suppose I could forgo pre-computing the constants and directly compute the IDCT for each sample which would entail lots more multiplications as well as 128 cosine calculations per sample (384 considering all 3 color planes). I’m a little stuck with the transform idea right now. Maybe there are some other transforms I could try.

    Another idea would be vector quantization. What little ORBX.js literature is available indicates that there is a method to allow real-time streaming but that it requires GPU assistance to yield enough horsepower to make it feasible. When I think of such severe asymmetry between compression and decompression, my mind drifts towards VQ algorithms. As I come to understand the benefits and limitations of GPU acceleration, I think I can envision a way that something similar to SVQ1, with its copious, hierarchical vector tables stored as textures, could be implemented using shaders.

    So far, this all pertains to intra-coded video frames. What about opportunities for inter-coded frames ? The only approach that I can envision here is to use WebGL’s readPixels() function to fetch the rasterized frame out of the GPU, and then upload it again as a new texture which a new frame processing pipeline could reference. Whether this idea is plausible would require some profiling.

    Using interframes in such a manner seems to imply that the entire codec would need to operate in RGB space and not YUV.

    Conclusions
    The people behind ORBX.js have apparently figured out a way to create a shader-based video codec. I have yet to even begin to reason out a plausible approach. However, I’m glad I did this exercise since I have finally broken through my ignorance regarding modern GPU shader programming. It’s nice to have a topic like multimedia that allows me a jumping-off point to explore other areas.

  • Trying to sync audio/visual using FFMpeg and openAL

    22 août 2013, par user1379811

    hI have been studying dranger ffmpeg tutorial which explains how to sync audio and visual once you have the frames displayed and audio playing which is where im at.

    Unfortunately, the tutorial is out of date (Stephen Dranger explaained that himself to me) and also uses sdl which im not doing - this is for Blackberry 10 application.

    I just cannot make the video frames display at the correct speed (they are just playing very fast) and I have been trying for over a week now - seriously !

    I have 3 threads happening - one to read from stream into audio and video queues and then 2 threads for audio and video.

    If somebody could explain whats happening after scanning my relevent code you would be a lifesaver.

    The delay (what I pass to usleep(testDelay) seems to be going up (incrementing) which doesn't seem right to me.

    count = 1;
       MyApp* inst = worker->app;//(VideoUploadFacebook*)arg;
       qDebug() << "\n start loadstream";
       w = new QWaitCondition();
       w2 = new QWaitCondition();
       context = avformat_alloc_context();
       inst->threadStarted = true;
       cout << "start of decoding thread";
       cout.flush();


       av_register_all();
       avcodec_register_all();
       avformat_network_init();
       av_log_set_callback(&log_callback);
       AVInputFormat   *pFormat;
       //const char      device[]     = "/dev/video0";
       const char      formatName[] = "mp4";
       cout << "2start of decoding thread";
       cout.flush();



       if (!(pFormat = av_find_input_format(formatName))) {
           printf("can't find input format %s\n", formatName);
           //return void*;
       }
       //open rtsp
       if(avformat_open_input(&context, inst->capturedUrl.data(), pFormat,NULL) != 0){
           // return ;
           cout << "error opening of decoding thread: " << inst->capturedUrl.data();
           cout.flush();
       }

       cout << "3start of decoding thread";
       cout.flush();
       // av_dump_format(context, 0, inst->capturedUrl.data(), 0);
       /*   if(avformat_find_stream_info(context,NULL) < 0){
           return EXIT_FAILURE;
       }
        */
       //search video stream
       for(int i =0;inb_streams;i++){
           if(context->streams[i]->codec->codec_type == AVMEDIA_TYPE_VIDEO)
               inst->video_stream_index = i;
       }
       cout << "3z start of decoding thread";
       cout.flush();
       AVFormatContext* oc = avformat_alloc_context();
       av_read_play(context);//play RTSP
       AVDictionary *optionsDict = NULL;
       ccontext = context->streams[inst->video_stream_index]->codec;

       inst->audioc = context->streams[1]->codec;

       cout << "4start of decoding thread";
       cout.flush();
       codec = avcodec_find_decoder(ccontext->codec_id);
       ccontext->pix_fmt = PIX_FMT_YUV420P;

       AVCodec* audio_codec = avcodec_find_decoder(inst->audioc->codec_id);
       inst->packet = new AVPacket();
       if (!audio_codec) {
           cout << "audio codec not found\n"; //fflush( stdout );
           exit(1);
       }

       if (avcodec_open2(inst->audioc, audio_codec, NULL) < 0) {
           cout << "could not open codec\n"; //fflush( stdout );
           exit(1);
       }

       if (avcodec_open2(ccontext, codec, &optionsDict) < 0) exit(1);

       cout << "5start of decoding thread";
       cout.flush();
       inst->pic = avcodec_alloc_frame();

       av_init_packet(inst->packet);

       while(av_read_frame(context,inst->packet) >= 0 && &inst->keepGoing)
       {

           if(inst->packet->stream_index == 0){//packet is video

               int check = 0;



               // av_init_packet(inst->packet);
               int result = avcodec_decode_video2(ccontext, inst->pic, &check, inst->packet);

               if(check)
                   break;
           }
       }



       inst->originalVideoWidth = inst->pic->width;
       inst->originalVideoHeight = inst->pic->height;
       float aspect = (float)inst->originalVideoHeight / (float)inst->originalVideoWidth;
       inst->newVideoWidth = inst->originalVideoWidth;
       int newHeight = (int)(inst->newVideoWidth * aspect);
       inst->newVideoHeight = newHeight;//(int)inst->originalVideoHeight / inst->originalVideoWidth * inst->newVideoWidth;// = new height
       int size = avpicture_get_size(PIX_FMT_YUV420P, inst->originalVideoWidth, inst->originalVideoHeight);
       uint8_t* picture_buf = (uint8_t*)(av_malloc(size));
       avpicture_fill((AVPicture *) inst->pic, picture_buf, PIX_FMT_YUV420P, inst->originalVideoWidth, inst->originalVideoHeight);

       picrgb = avcodec_alloc_frame();
       int size2 = avpicture_get_size(PIX_FMT_YUV420P, inst->newVideoWidth, inst->newVideoHeight);
       uint8_t* picture_buf2 = (uint8_t*)(av_malloc(size2));
       avpicture_fill((AVPicture *) picrgb, picture_buf2, PIX_FMT_YUV420P, inst->newVideoWidth, inst->newVideoHeight);



       if(ccontext->pix_fmt != PIX_FMT_YUV420P)
       {
           std::cout << "fmt != 420!!!: " << ccontext->pix_fmt << std::endl;//
           // return (EXIT_SUCCESS);//-1;

       }


       if (inst->createForeignWindow(inst->myForeignWindow->windowGroup(),
               "HelloForeignWindowAppIDqq", 0,
               0, inst->newVideoWidth,
               inst->newVideoHeight)) {

       } else {
           qDebug() << "The ForeginWindow was not properly initialized";
       }




       inst->keepGoing = true;

       inst->img_convert_ctx = sws_getContext(inst->originalVideoWidth, inst->originalVideoHeight, PIX_FMT_YUV420P, inst->newVideoWidth, inst->newVideoHeight,
               PIX_FMT_YUV420P, SWS_BILINEAR, NULL, NULL, NULL);

       is = (VideoState*)av_mallocz(sizeof(VideoState));
       if (!is)
           return NULL;

       is->audioStream = 1;
       is->audio_st = context->streams[1];
       is->audio_buf_size = 0;
       is->audio_buf_index = 0;
       is->videoStream = 0;
       is->video_st = context->streams[0];

       is->frame_timer = (double)av_gettime() / 1000000.0;
       is->frame_last_delay = 40e-3;

       is->av_sync_type = DEFAULT_AV_SYNC_TYPE;
       //av_strlcpy(is->filename, filename, sizeof(is->filename));
       is->iformat = pFormat;
       is->ytop    = 0;
       is->xleft   = 0;

       /* start video display */
       is->pictq_mutex = new QMutex();
       is->pictq_cond  = new QWaitCondition();

       is->subpq_mutex = new QMutex();
       is->subpq_cond  = new QWaitCondition();

       is->video_current_pts_time = av_gettime();


       packet_queue_init(&audioq);

       packet_queue_init(&videoq);
       is->audioq = audioq;
       is->videoq = videoq;
       AVPacket* packet2  = new AVPacket();

       ccontext->get_buffer = our_get_buffer;
       ccontext->release_buffer = our_release_buffer;


       av_init_packet(packet2);
       while(inst->keepGoing)
       {


           if(av_read_frame(context,packet2) < 0 && keepGoing)
           {
               printf("bufferframe Could not read a frame from stream.\n");
               fflush( stdout );


           }else {



               if(packet2->stream_index == 0) {
                   packet_queue_put(&videoq, packet2);
               } else if(packet2->stream_index == 1) {
                   packet_queue_put(&audioq, packet2);
               } else {
                   av_free_packet(packet2);
               }


               if(!videoThreadStarted)
               {
                   videoThreadStarted = true;
                   QThread* thread = new QThread;
                   videoThread = new VideoStreamWorker(this);

                   // Give QThread ownership of Worker Object
                   videoThread->moveToThread(thread);
                   connect(videoThread, SIGNAL(error(QString)), this, SLOT(errorHandler(QString)));
                   QObject::connect(videoThread, SIGNAL(refreshNeeded()), this, SLOT(refreshNeededSlot()));
                   connect(thread, SIGNAL(started()), videoThread, SLOT(doWork()));
                   connect(videoThread, SIGNAL(finished()), thread, SLOT(quit()));
                   connect(videoThread, SIGNAL(finished()), videoThread, SLOT(deleteLater()));
                   connect(thread, SIGNAL(finished()), thread, SLOT(deleteLater()));

                   thread->start();
               }

               if(!audioThreadStarted)
               {
                   audioThreadStarted = true;
                   QThread* thread = new QThread;
                   AudioStreamWorker* videoThread = new AudioStreamWorker(this);

                   // Give QThread ownership of Worker Object
                   videoThread->moveToThread(thread);

                   // Connect videoThread error signal to this errorHandler SLOT.
                   connect(videoThread, SIGNAL(error(QString)), this, SLOT(errorHandler(QString)));

                   // Connects the thread’s started() signal to the process() slot in the videoThread, causing it to start.
                   connect(thread, SIGNAL(started()), videoThread, SLOT(doWork()));
                   connect(videoThread, SIGNAL(finished()), thread, SLOT(quit()));
                   connect(videoThread, SIGNAL(finished()), videoThread, SLOT(deleteLater()));

                   // Make sure the thread object is deleted after execution has finished.
                   connect(thread, SIGNAL(finished()), thread, SLOT(deleteLater()));

                   thread->start();
               }

           }

       } //finished main loop

       int MyApp::video_thread() {
       //VideoState *is = (VideoState *)arg;
       AVPacket pkt1, *packet = &pkt1;
       int len1, frameFinished;

       double pts;
       pic = avcodec_alloc_frame();

       for(;;) {
           if(packet_queue_get(&videoq, packet, 1) < 0) {
               // means we quit getting packets
               break;
           }

           pts = 0;

           global_video_pkt_pts2 = packet->pts;
           // Decode video frame
           len1 =  avcodec_decode_video2(ccontext, pic, &frameFinished, packet);
           if(packet->dts == AV_NOPTS_VALUE
                   && pic->opaque && *(uint64_t*)pic->opaque != AV_NOPTS_VALUE) {
               pts = *(uint64_t *)pic->opaque;
           } else if(packet->dts != AV_NOPTS_VALUE) {
               pts = packet->dts;
           } else {
               pts = 0;
           }
           pts *= av_q2d(is->video_st->time_base);
           // Did we get a video frame?

                   if(frameFinished) {
                       pts = synchronize_video(is, pic, pts);
                       actualPts = pts;
                       refreshSlot();
                   }
                   av_free_packet(packet);
       }
       av_free(pic);
       return 0;
    }


    int MyApp::audio_thread() {
       //VideoState *is = (VideoState *)arg;
       AVPacket pkt1, *packet = &pkt1;
       int len1, frameFinished;
       ALuint source;
       ALenum format = 0;
       //   ALuint frequency;
       ALenum alError;
       ALint val2;
       ALuint buffers[NUM_BUFFERS];
       int dataSize;


       ALCcontext *aContext;
       ALCdevice *device;
       if (!alutInit(NULL, NULL)) {
           // printf(stderr, "init alut error\n");
       }
       device = alcOpenDevice(NULL);
       if (device == NULL) {
           // printf(stderr, "device error\n");
       }

       //Create a context
       aContext = alcCreateContext(device, NULL);
       alcMakeContextCurrent(aContext);
       if(!(aContext)) {
           printf("Could not create the OpenAL context!\n");
           return 0;
       }

       alListener3f(AL_POSITION, 0.0f, 0.0f, 0.0f);









       //ALenum alError;
       if(alGetError() != AL_NO_ERROR) {
           cout << "could not create buffers";
           cout.flush();
           fflush( stdout );
           return 0;
       }
       alGenBuffers(NUM_BUFFERS, buffers);
       alGenSources(1, &source);
       if(alGetError() != AL_NO_ERROR) {
           cout << "after Could not create buffers or the source.\n";
           cout.flush(  );
           return 0;
       }

       int i;
       int indexOfPacket;
       double pts;
       //double pts;
       int n;


       for(i = 0; i < NUM_BUFFERS; i++)
       {
           if(packet_queue_get(&audioq, packet, 1) < 0) {
               // means we quit getting packets
               break;
           }
           cout << "streamindex=audio \n";
           cout.flush(  );
           //printf("before decode  audio\n");
           //fflush( stdout );
           // AVPacket *packet = new AVPacket();//malloc(sizeof(AVPacket*));
           AVFrame *decodedFrame = NULL;
           int gotFrame = 0;
           // AVFrame* decodedFrame;

           if(!decodedFrame) {
               if(!(decodedFrame = avcodec_alloc_frame())) {
                   cout << "Run out of memory, stop the streaming...\n";
                   fflush( stdout );
                   cout.flush();


                   return -2;
               }
           } else {
               avcodec_get_frame_defaults(decodedFrame);
           }

           int  len = avcodec_decode_audio4(audioc, decodedFrame, &gotFrame, packet);
           if(len < 0) {
               cout << "Error while decoding.\n";
               cout.flush(  );

               return -3;
           }
           if(len < 0) {
               /* if error, skip frame */
               is->audio_pkt_size = 0;
               //break;
           }
           is->audio_pkt_data += len;
           is->audio_pkt_size -= len;

           pts = is->audio_clock;
           // *pts_ptr = pts;
           n = 2 * is->audio_st->codec->channels;
           is->audio_clock += (double)packet->size/
                   (double)(n * is->audio_st->codec->sample_rate);
           if(gotFrame) {
               cout << "got audio frame.\n";
               cout.flush(  );
               // We have a buffer ready, send it
               dataSize = av_samples_get_buffer_size(NULL, audioc->channels,
                       decodedFrame->nb_samples, audioc->sample_fmt, 1);

               if(!format) {
                   if(audioc->sample_fmt == AV_SAMPLE_FMT_U8 ||
                           audioc->sample_fmt == AV_SAMPLE_FMT_U8P) {
                       if(audioc->channels == 1) {
                           format = AL_FORMAT_MONO8;
                       } else if(audioc->channels == 2) {
                           format = AL_FORMAT_STEREO8;
                       }
                   } else if(audioc->sample_fmt == AV_SAMPLE_FMT_S16 ||
                           audioc->sample_fmt == AV_SAMPLE_FMT_S16P) {
                       if(audioc->channels == 1) {
                           format = AL_FORMAT_MONO16;
                       } else if(audioc->channels == 2) {
                           format = AL_FORMAT_STEREO16;
                       }
                   }

                   if(!format) {
                       cout << "OpenAL can't open this format of sound.\n";
                       cout.flush(  );

                       return -4;
                   }
               }
               printf("albufferdata audio b4.\n");
               fflush( stdout );
               alBufferData(buffers[i], format, *decodedFrame->data, dataSize, decodedFrame->sample_rate);
               cout << "after albufferdata all buffers \n";
               cout.flush(  );
               av_free_packet(packet);
               //=av_free(packet);
               av_free(decodedFrame);

               if((alError = alGetError()) != AL_NO_ERROR) {
                   printf("Error while buffering.\n");

                   printAlError(alError);
                   return -6;
               }
           }
       }


       cout << "before quoe buffers \n";
       cout.flush();
       alSourceQueueBuffers(source, NUM_BUFFERS, buffers);
       cout << "before play.\n";
       cout.flush();
       alSourcePlay(source);
       cout << "after play.\n";
       cout.flush();
       if((alError = alGetError()) != AL_NO_ERROR) {
           cout << "error strating stream.\n";
           cout.flush();
           printAlError(alError);
           return 0;
       }


       // AVPacket *pkt = &is->audio_pkt;

       while(keepGoing)
       {
           while(packet_queue_get(&audioq, packet, 1)  >= 0) {
               // means we quit getting packets

               do {
                   alGetSourcei(source, AL_BUFFERS_PROCESSED, &val2);
                   usleep(SLEEP_BUFFERING);
               } while(val2 <= 0);
               if(alGetError() != AL_NO_ERROR)
               {
                   fprintf(stderr, "Error gettingsource :(\n");
                   return 1;
               }

               while(val2--)
               {



                   ALuint buffer;
                   alSourceUnqueueBuffers(source, 1, &buffer);
                   if(alGetError() != AL_NO_ERROR)
                   {
                       fprintf(stderr, "Error unqueue buffers :(\n");
                       //  return 1;
                   }
                   AVFrame *decodedFrame = NULL;
                   int gotFrame = 0;
                   // AVFrame* decodedFrame;

                   if(!decodedFrame) {
                       if(!(decodedFrame = avcodec_alloc_frame())) {
                           cout << "Run out of memory, stop the streaming...\n";
                           //fflush( stdout );
                           cout.flush();


                           return -2;
                       }
                   } else {
                       avcodec_get_frame_defaults(decodedFrame);
                   }

                   int  len = avcodec_decode_audio4(audioc, decodedFrame, &gotFrame, packet);
                   if(len < 0) {
                       cout << "Error while decoding.\n";
                       cout.flush(  );
                       is->audio_pkt_size = 0;
                       return -3;
                   }

                   is->audio_pkt_data += len;
                   is->audio_pkt_size -= len;
                   if(packet->size <= 0) {
                       /* No data yet, get more frames */
                       //continue;
                   }


                   if(gotFrame) {
                       pts = is->audio_clock;
                       len = synchronize_audio(is, (int16_t *)is->audio_buf,
                               packet->size, pts);
                       is->audio_buf_size = packet->size;
                       pts = is->audio_clock;
                       // *pts_ptr = pts;
                       n = 2 * is->audio_st->codec->channels;
                       is->audio_clock += (double)packet->size /
                               (double)(n * is->audio_st->codec->sample_rate);
                       if(packet->pts != AV_NOPTS_VALUE) {
                           is->audio_clock = av_q2d(is->audio_st->time_base)*packet->pts;
                       }
                       len = av_samples_get_buffer_size(NULL, audioc->channels,
                               decodedFrame->nb_samples, audioc->sample_fmt, 1);
                       alBufferData(buffer, format, *decodedFrame->data, len, decodedFrame->sample_rate);
                       if(alGetError() != AL_NO_ERROR)
                       {
                           fprintf(stderr, "Error buffering :(\n");
                           return 1;
                       }
                       alSourceQueueBuffers(source, 1, &buffer);
                       if(alGetError() != AL_NO_ERROR)
                       {
                           fprintf(stderr, "Error queueing buffers :(\n");
                           return 1;
                       }
                   }





               }

               alGetSourcei(source, AL_SOURCE_STATE, &val2);
               if(val2 != AL_PLAYING)
                   alSourcePlay(source);

           }


           //pic = avcodec_alloc_frame();
       }
       qDebug() << "end audiothread";
       return 1;
    }

    void MyApp::refreshSlot()
    {


       if(true)
       {

           printf("got frame %d, %d\n", pic->width, ccontext->width);
           fflush( stdout );

           sws_scale(img_convert_ctx, (const uint8_t **)pic->data, pic->linesize,
                   0, originalVideoHeight, &picrgb->data[0], &picrgb->linesize[0]);

           printf("rescaled frame %d, %d\n", newVideoWidth, newVideoHeight);
           fflush( stdout );
           //av_free_packet(packet);
           //av_init_packet(packet);

           qDebug() << "waking audio as video finished";
           ////mutex.unlock();
           //mutex2.lock();
           doingVideoFrame = false;
           //doingAudioFrame = false;
           ////mutex2.unlock();


           //mutex2.unlock();
           //w2->wakeAll();
           //w->wakeAll();
           qDebug() << "now woke audio";

           //pic = picrgb;
           uint8_t *srcy = picrgb->data[0];
           uint8_t *srcu = picrgb->data[1];
           uint8_t *srcv = picrgb->data[2];
           printf("got src yuv frame %d\n", &srcy);
           fflush( stdout );
           unsigned char *ptr = NULL;
           screen_get_buffer_property_pv(mScreenPixelBuffer, SCREEN_PROPERTY_POINTER, (void**) &ptr);
           unsigned char *y = ptr;
           unsigned char *u = y + (newVideoHeight * mStride) ;
           unsigned char *v = u + (newVideoHeight * mStride) / 4;
           int i = 0;
           printf("got buffer  picrgbwidth= %d \n", newVideoWidth);
           fflush( stdout );
           for ( i = 0; i < newVideoHeight; i++)
           {
               int doff = i * mStride;
               int soff = i * picrgb->linesize[0];
               memcpy(&y[doff], &srcy[soff], newVideoWidth);
           }

           for ( i = 0; i < newVideoHeight / 2; i++)
           {
               int doff = i * mStride / 2;
               int soff = i * picrgb->linesize[1];
               memcpy(&u[doff], &srcu[soff], newVideoWidth / 2);
           }

           for ( i = 0; i < newVideoHeight / 2; i++)
           {
               int doff = i * mStride / 2;
               int soff = i * picrgb->linesize[2];
               memcpy(&v[doff], &srcv[soff], newVideoWidth / 2);
           }
           printf("before posttoscreen \n");
           fflush( stdout );

           video_refresh_timer();
           qDebug() << "end refreshslot";

       }
       else
       {

       }





    }

    void  MyApp::refreshNeededSlot2()
       {
           printf("blitting to buffer");
           fflush(stdout);

           screen_buffer_t screen_buffer;
           screen_get_window_property_pv(mScreenWindow, SCREEN_PROPERTY_RENDER_BUFFERS, (void**) &screen_buffer);
           int attribs[] = { SCREEN_BLIT_SOURCE_WIDTH, newVideoWidth, SCREEN_BLIT_SOURCE_HEIGHT, newVideoHeight, SCREEN_BLIT_END };
           int res2 = screen_blit(mScreenCtx, screen_buffer, mScreenPixelBuffer, attribs);
           printf("dirty rectangles");
           fflush(stdout);
           int dirty_rects[] = { 0, 0, newVideoWidth, newVideoHeight };
           screen_post_window(mScreenWindow, screen_buffer, 1, dirty_rects, 0);
           printf("done screneposdtwindow");
           fflush(stdout);

       }

    void MyApp::video_refresh_timer() {
       testDelay = 0;
       //  VideoState *is = ( VideoState* )userdata;
       VideoPicture *vp;
       //double pts = 0    ;
       double actual_delay, delay, sync_threshold, ref_clock, diff;

       if(is->video_st) {
           if(false)////is->pictq_size == 0)
           {
               testDelay = 1;
               schedule_refresh(is, 1);
           } else {
               // vp = &is->pictq[is->pictq_rindex];

               delay = actualPts - is->frame_last_pts; /* the pts from last time */
               if(delay <= 0 || delay >= 1.0) {
                   /* if incorrect delay, use previous one */
                   delay = is->frame_last_delay;
               }
               /* save for next time */
               is->frame_last_delay = delay;
               is->frame_last_pts = actualPts;

               is->video_current_pts = actualPts;
               is->video_current_pts_time = av_gettime();
               /* update delay to sync to audio */
               ref_clock = get_audio_clock(is);
               diff = actualPts - ref_clock;

               /* Skip or repeat the frame. Take delay into account
        FFPlay still doesn't "know if this is the best guess." */
               sync_threshold = (delay > AV_SYNC_THRESHOLD) ? delay : AV_SYNC_THRESHOLD;
               if(fabs(diff) < AV_NOSYNC_THRESHOLD) {
                   if(diff <= -sync_threshold) {
                       delay = 0;
                   } else if(diff >= sync_threshold) {
                       delay = 2 * delay;
                   }
               }
               is->frame_timer += delay;
               /* computer the REAL delay */
               actual_delay = is->frame_timer - (av_gettime() / 1000000.0);
               if(actual_delay < 0.010) {
                   /* Really it should skip the picture instead */
                   actual_delay = 0.010;
               }
               testDelay = (int)(actual_delay * 1000 + 0.5);
               schedule_refresh(is, (int)(actual_delay * 1000 + 0.5));
               /* show the picture! */
               //video_display(is);


               // SDL_CondSignal(is->pictq_cond);
               // SDL_UnlockMutex(is->pictq_mutex);
           }
       } else {
           testDelay = 100;
           schedule_refresh(is, 100);

       }
    }

    void MyApp::schedule_refresh(VideoState *is, int delay) {
       qDebug() << "start schedule refresh timer" << delay;
       typeOfEvent = FF_REFRESH_EVENT2;
       w->wakeAll();
       //  SDL_AddTimer(delay,


    }

    I am currently waiting on data in a loop in the following way

    QMutex mutex;
       mutex.lock();
       while(keepGoing)
       {



           qDebug() << "MAINTHREAD" << testDelay;


           w->wait(&mutex);
           mutex.unlock();
           qDebug() << "MAINTHREAD past wait";

           if(!keepGoing)
           {
               break;
           }
           if(testDelay > 0 && typeOfEvent == FF_REFRESH_EVENT2)
           {
               usleep(testDelay);
               refreshNeededSlot2();
           }
           else   if(testDelay > 0 && typeOfEvent == FF_QUIT_EVENT2)
           {
               keepGoing = false;
               exit(0);
               break;
               // usleep(testDelay);
               // refreshNeededSlot2();
           }
           qDebug() << "MAINTHREADend";
           mutex.lock();

       }
       mutex.unlock();

    Please let me know if I need to provide any more relevent code. I'm sorry my code is untidy - I still learning c++ and have been modifying this code for over a week now as previously mentioned.

    Just added a sample of output I'm seeing from print outs I do to console - I can't get my head around it (it's almost too complicated for my level of expertise) but when you see the frames being played and audio playing it's very difficult to give up especially when it took me a couple of weeks to get to this stage.

    Please someone give me a hand if they spot the problem.

    MAINTHREAD past wait
    pts after syncvideo= 1073394046
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.66833
    frame lastpts = 1.63497
    start schedule refresh timer need to delay for 123

    pts after syncvideo= 1073429033
    got frame 640, 640
    MAINTHREAD loop delay before refresh = 123
    start video_refresh_timer
    actualpts = 1.7017
    frame lastpts = 1.66833
    start schedule refresh timer need to delay for 115

    MAINTHREAD past wait
    pts after syncvideo= 1073464021
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.73507
    frame lastpts = 1.7017
    start schedule refresh timer need to delay for 140

    MAINTHREAD loop delay before refresh = 140
    pts after syncvideo= 1073499008
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.76843
    frame lastpts = 1.73507
    start schedule refresh timer need to delay for 163

    MAINTHREAD past wait
    pts after syncvideo= 1073533996
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.8018
    frame lastpts = 1.76843
    start schedule refresh timer need to delay for 188

    MAINTHREAD loop delay before refresh = 188
    pts after syncvideo= 1073568983
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.83517
    frame lastpts = 1.8018
    start schedule refresh timer need to delay for 246

    MAINTHREAD past wait
    pts after syncvideo= 1073603971
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.86853
    frame lastpts = 1.83517
    start schedule refresh timer need to delay for 299

    MAINTHREAD loop delay before refresh = 299
    pts after syncvideo= 1073638958
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.9019
    frame lastpts = 1.86853
    start schedule refresh timer need to delay for 358

    MAINTHREAD past wait
    pts after syncvideo= 1073673946
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.93527
    frame lastpts = 1.9019
    start schedule refresh timer need to delay for 416

    MAINTHREAD loop delay before refresh = 416
    pts after syncvideo= 1073708933
    got frame 640, 640
    start video_refresh_timer
    actualpts = 1.96863
    frame lastpts = 1.93527
    start schedule refresh timer need to delay for 474

    MAINTHREAD past wait
    pts after syncvideo= 1073742872
    got frame 640, 640
    MAINTHREAD loop delay before refresh = 474
    start video_refresh_timer
    actualpts = 2.002
    frame lastpts = 1.96863
    start schedule refresh timer need to delay for 518

    MAINTHREAD past wait
    pts after syncvideo= 1073760366
    got frame 640, 640
    start video_refresh_timer
    actualpts = 2.03537
    frame lastpts = 2.002
    start schedule refresh timer need to delay for 575

  • Ogg objections

    3 mars 2010, par Mans — Multimedia

    The Ogg container format is being promoted by the Xiph Foundation for use with its Vorbis and Theora codecs. Unfortunately, a number of technical shortcomings in the format render it ill-suited to most, if not all, use cases. This article examines the most severe of these flaws.

    Overview of Ogg

    The basic unit in an Ogg stream is the page consisting of a header followed by one or more packets from a single elementary stream. A page can contain up to 255 packets, and a packet can span any number of pages. The following table describes the page header.

    Field Size (bits) Description
    capture_pattern 32 magic number “OggS”
    version 8 always zero
    flags 8
    granule_position 64 abstract timestamp
    bitstream_serial_number 32 elementary stream number
    page_sequence_number 32 incremented by 1 each page
    checksum 32 CRC of entire page
    page_segments 8 length of segment_table
    segment_table variable list of packet sizes

    Elementary stream types are identified by looking at the payload of the first few pages, which contain any setup data required by the decoders. For full details, see the official format specification.

    Generality

    Ogg, legend tells, was designed to be a general-purpose container format. To most multimedia developers, a general-purpose format is one in which encoded data of any type can be encapsulated with a minimum of effort.

    The Ogg format defined by the specification does not fit this description. For every format one wishes to use with Ogg, a complex mapping must first be defined. This mapping defines how to identify a codec, how to extract setup data, and even how timestamps are to be interpreted. All this is done differently for every codec. To correctly parse an Ogg stream, every such mapping ever defined must be known.

    Under this premise, a centralised repository of codec mappings would seem like a sensible idea, but alas, no such thing exists. It is simply impossible to obtain a exhaustive list of defined mappings, which makes the task of creating a complete implementation somewhat daunting.

    One brave soul, Tobias Waldvogel, created a mapping, OGM, capable of storing any Microsoft AVI compatible codec data in Ogg files. This format saw some use in the wild, but was frowned upon by Xiph, and it was eventually displaced by other formats.

    True generality is evidently not to be found with the Ogg format.

    A good example of a general-purpose format is Matroska. This container can trivially accommodate any codec, all it requires is a unique string to identify the codec. For codecs requiring setup data, a standard location for this is provided in the container. Furthermore, an official list of codec identifiers is maintained, meaning all information required to fully support Matroska files is available from one place.

    Matroska also has probably the greatest advantage of all : it is in active, wide-spread use. Historically, standards derived from existing practice have proven more successful than those created by a design committee.

    Overhead

    When designing a container format, one important consideration is that of overhead, i.e. the extra space required in addition to the elementary stream data being combined. For any given container, the overhead can be divided into a fixed part, independent of the total file size, and a variable part growing with increasing file size. The fixed overhead is not of much concern, its relative contribution being negligible for typical file sizes.

    The variable overhead in the Ogg format comes from the page headers, mostly from the segment_table field. This field uses a most peculiar encoding, somewhat reminiscent of Roman numerals. In Roman times, numbers were written as a sequence of symbols, each representing a value, the combined value being the sum of the constituent values.

    The segment_table field lists the sizes of all packets in the page. Each value in the list is coded as a number of bytes equal to 255 followed by a final byte with a smaller value. The packet size is simply the sum of all these bytes. Any strictly additive encoding, such as this, has the distinct drawback of coded length being linearly proportional to the encoded value. A value of 5000, a reasonable packet size for video of moderate bitrate, requires no less than 20 bytes to encode.

    On top of this we have the 27-byte page header which, although paling in comparison to the packet size encoding, is still much larger than necessary. Starting at the top of the list :

    • The version field could be disposed of, a single-bit marker being adequate to separate this first version from hypothetical future versions. One of the unused positions in the flags field could be used for this purpose
    • A 64-bit granule_position is completely overkill. 32 bits would be more than enough for the vast majority of use cases. In extreme cases, a one-bit flag could be used to signal an extended timestamp field.
    • 32-bit elementary stream number ? Are they anticipating files with four billion elementary streams ? An eight-bit field, if not smaller, would seem more appropriate here.
    • The 32-bit page_sequence_number is inexplicable. The intent is to allow detection of page loss due to transmission errors. ISO MPEG-TS uses a 4-bit counter per 188-byte packet for this purpose, and that format is used where packet loss actually happens, unlike any use of Ogg to date.
    • A mandatory 32-bit checksum is nothing but a waste of space when using a reliable storage/transmission medium. Again, a flag could be used to signal the presence of an optional checksum field.

    With the changes suggested above, the page header would shrink from 27 bytes to 12 bytes in size.

    We thus see that in an Ogg file, the packet size fields alone contribute an overhead of 1/255 or approximately 0.4%. This is a hard lower bound on the overhead, not attainable even in theory. In reality the overhead tends to be closer to 1%.

    Contrast this with the ISO MP4 file format, which can easily achieve an overhead of less than 0.05% with a 1 Mbps elementary stream.

    Latency

    In many applications end-to-end latency is an important factor. Examples include video conferencing, telephony, live sports events, interactive gaming, etc. With the codec layer contributing as little as 10 milliseconds of latency, the amount imposed by the container becomes an important factor.

    Latency in an Ogg-based system is introduced at both the sender and the receiver. Since the page header depends on the entire contents of the page (packet sizes and checksum), a full page of packets must be buffered by the sender before a single bit can be transmitted. This sets a lower bound for the sending latency at the duration of a page.

    On the receiving side, playback cannot commence until packets from all elementary streams are available. Hence, with two streams (audio and video) interleaved at the page level, playback is delayed by at least one page duration (two if checksums are verified).

    Taking both send and receive latencies into account, the minimum end-to-end latency for Ogg is thus twice the duration of a page, triple if strict checksum verification is required. If page durations are variable, the maximum value must be used in order to avoid buffer underflows.

    Minimum latency is clearly achieved by minimising the page duration, which in turn implies sending only one packet per page. This is where the size of the page header becomes important. The header for a single-packet page is 27 + packet_size/255 bytes in size. For a 1 Mbps video stream at 25 fps this gives an overhead of approximately 1%. With a typical audio packet size of 400 bytes, the overhead becomes a staggering 7%. The average overhead for a multiplex of these two streams is 1.4%.

    As it stands, the Ogg format is clearly not a good choice for a low-latency application. The key to low latency is small packets and fine-grained interleaving of streams, and although Ogg can provide both of these, by sending a single packet per page, the price in overhead is simply too high.

    ISO MPEG-PS has an overhead of 9 bytes on most packets (a 5-byte timestamp is added a few times per second), and Microsoft’s ASF has a 12-byte packet header. My suggestions for compacting the Ogg page header would bring it in line with these formats.

    Random access

    Any general-purpose container format needs to allow random access for direct seeking to any given position in the file. Despite this goal being explicitly mentioned in the Ogg specification, the format only allows the most crude of random access methods.

    While many container formats include an index allowing a time to be directly translated into an offset into the file, Ogg has nothing of this kind, the stated rationale for the omission being that this would require a two-pass multiplexing, the second pass creating the index. This is obviously not true ; the index could simply be written at the end of the file. Those objecting that this index would be unavailable in a streaming scenario are forgetting that seeking is impossible there regardless.

    The method for seeking suggested by the Ogg documentation is to perform a binary search on the file, after each file-level seek operation scanning for a page header, extracting the timestamp, and comparing it to the desired position. When the elementary stream encoding allows only certain packets as random access points (video key frames), a second search will have to be performed to locate the entry point closest to the desired time. In a large file (sizes upwards of 10 GB are common), 50 seeks might be required to find the correct position.

    A typical hard drive has an average seek time of roughly 10 ms, giving a total time for the seek operation of around 500 ms, an annoyingly long time. On a slow medium, such as an optical disc or files served over a network, the times are orders of magnitude longer.

    A factor further complicating the seeking process is the possibility of header emulation within the elementary stream data. To safeguard against this, one has to read the entire page and verify the checksum. If the storage medium cannot provide data much faster than during normal playback, this provides yet another substantial delay towards finishing the seeking operation. This too applies to both network delivery and optical discs.

    Although optical disc usage is perhaps in decline today, one should bear in mind that the Ogg format was designed at a time when CDs and DVDs were rapidly gaining ground, and network-based storage is most certainly on the rise.

    The final nail in the coffin of seeking is the codec-dependent timestamp format. At each step in the seeking process, the timestamp parsing specified by the codec mapping corresponding the current page must be invoked. If the mapping is not known, the best one can do is skip pages until one with a known mapping is found. This delays the seeking and complicates the implementation, both bad things.

    Timestamps

    A problem old as multimedia itself is that of synchronising multiple elementary streams (e.g. audio and video) during playback ; badly synchronised A/V is highly unpleasant to view. By the time Ogg was invented, solutions to this problem were long since explored and well-understood. The key to proper synchronisation lies in tagging elementary stream packets with timestamps, packets carrying the same timestamp intended for simultaneous presentation. The concept is as simple as it seems, so it is astonishing to see the amount of complexity with which the Ogg designers managed to imbue it. So bizarre is it, that I have devoted an entire article to the topic, and will not cover it further here.

    Complexity

    Video and audio decoding are time-consuming tasks, so containers should be designed to minimise extra processing required. With the data volumes involved, even an act as simple as copying a packet of compressed data can have a significant impact. Once again, however, Ogg lets us down. Despite the brevity of the specification, the format is remarkably complicated to parse properly.

    The unusual and inefficient encoding of the packet sizes limits the page size to somewhat less than 64 kB. To still allow individual packets larger than this limit, it was decided to allow packets spanning multiple pages, a decision with unfortunate implications. A page-spanning packet as it arrives in the Ogg stream will be discontiguous in memory, a situation most decoders are unable to handle, and reassembly, i.e. copying, is required.

    The knowledgeable reader may at this point remark that the MPEG-TS format also splits packets into pieces requiring reassembly before decoding. There is, however, a significant difference there. MPEG-TS was designed for hardware demultiplexing feeding directly into hardware decoders. In such an implementation the fragmentation is not a problem. Rather, the fine-grained interleaving is a feature allowing smaller on-chip buffers.

    Buffering is also an area in which Ogg suffers. To keep the overhead down, pages must be made as large as practically possible, and page size translates directly into demultiplexer buffer size. Playback of a file with two elementary streams thus requires 128 kB of buffer space. On a modern PC this is perhaps nothing to be concerned about, but in a small embedded system, e.g. a portable media player, it can be relevant.

    In addition to the above, a number of other issues, some of them minor, others more severe, make Ogg processing a painful experience. A selection follows :

    • 32-bit random elementary stream identifiers mean a simple table-lookup cannot be used. Instead the list of streams must be searched for a match. While trivial to do in software, it is still annoying, and a hardware demultiplexer would be significantly more complicated than with a smaller identifier.
    • Semantically ambiguous streams are possible. For example, the continuation flag (bit 1) may conflict with continuation (or lack thereof) implied by the segment table on the preceding page. Such invalid files have been spotted in the wild.
    • Concatenating independent Ogg streams forms a valid stream. While finding a use case for this strange feature is difficult, an implementation must of course be prepared to encounter such streams. Detecting and dealing with these adds pointless complexity.
    • Unusual terminology : inventing new terms for well-known concepts is confusing for the developer trying to understand the format in relation to others. A few examples :
      Ogg name Usual name
      logical bitstream elementary stream
      grouping multiplexing
      lacing value packet size (approximately)
      segment imaginary element serving no real purpose
      granule position timestamp

    Final words

    We have found the Ogg format to be a dubious choice in just about every situation. Why then do certain organisations and individuals persist in promoting it with such ferocity ?

    When challenged, three types of reaction are characteristic of the Ogg campaigners.

    On occasion, these people will assume an apologetic tone, explaining how Ogg was only ever designed for simple audio-only streams (ignoring it is as bad for these as for anything), and this is no doubt true. Why then, I ask again, do they continue to tout Ogg as the one-size-fits-all solution they already admitted it is not ?

    More commonly, the Ogg proponents will respond with hand-waving arguments best summarised as Ogg isn’t bad, it’s just different. My reply to this assertion is twofold :

    • Being too different is bad. We live in a world where multimedia files come in many varieties, and a decent media player will need to handle the majority of them. Fortunately, most multimedia file formats share some basic traits, and they can easily be processed in the same general framework, the specifics being taken care of at the input stage. A format deviating too far from the standard model becomes problematic.
    • Ogg is bad. When every angle of examination reveals serious flaws, bad is the only fitting description.

    The third reaction bypasses all technical analysis : Ogg is patent-free, a claim I am not qualified to directly discuss. Assuming it is true, it still does not alter the fact that Ogg is a bad format. Being free from patents does not magically make Ogg a good choice as file format. If all the standard formats are indeed covered by patents, the only proper solution is to design a new, good format which is not, this time hopefully avoiding the old mistakes.