Recherche avancée

Médias (0)

Mot : - Tags -/flash

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (80)

  • Personnaliser en ajoutant son logo, sa bannière ou son image de fond

    5 septembre 2013, par

    Certains thèmes prennent en compte trois éléments de personnalisation : l’ajout d’un logo ; l’ajout d’une bannière l’ajout d’une image de fond ;

  • MediaSPIP v0.2

    21 juin 2013, par

    MediaSPIP 0.2 est la première version de MediaSPIP stable.
    Sa date de sortie officielle est le 21 juin 2013 et est annoncée ici.
    Le fichier zip ici présent contient uniquement les sources de MediaSPIP en version standalone.
    Comme pour la version précédente, il est nécessaire d’installer manuellement l’ensemble des dépendances logicielles sur le serveur.
    Si vous souhaitez utiliser cette archive pour une installation en mode ferme, il vous faudra également procéder à d’autres modifications (...)

  • Mise à disposition des fichiers

    14 avril 2011, par

    Par défaut, lors de son initialisation, MediaSPIP ne permet pas aux visiteurs de télécharger les fichiers qu’ils soient originaux ou le résultat de leur transformation ou encodage. Il permet uniquement de les visualiser.
    Cependant, il est possible et facile d’autoriser les visiteurs à avoir accès à ces documents et ce sous différentes formes.
    Tout cela se passe dans la page de configuration du squelette. Il vous faut aller dans l’espace d’administration du canal, et choisir dans la navigation (...)

Sur d’autres sites (10707)

  • FFMPEG Presentation Time Stamps (PTS) calculation in RTSP stream

    8 décembre 2020, par BadaBudaBudu

    Below please find en raw example of my code for your better understanding of what it does. Please note that this is an updated (removed deprecated methods, etc.) example code by myself from the official FFMPEG documentation complemented by my encoder.

    


    /// STD&#xA;#include <iostream>&#xA;#include <string>&#xA;&#xA;/// FFMPEG&#xA;extern "C"&#xA;{&#xA;    #include <libavformat></libavformat>avformat.h>&#xA;    #include <libswscale></libswscale>swscale.h>&#xA;    #include <libavutil></libavutil>imgutils.h>&#xA;}&#xA;&#xA;/// VideoLib&#xA;#include <tools></tools>multimediaprocessing.h>&#xA;#include &#xA;#include &#xA;#include <enums></enums>codec.h>&#xA;#include <enums></enums>pixelformat.h>&#xA;&#xA;/// OpenCV&#xA;#include <opencv2></opencv2>opencv.hpp>&#xA;&#xA;inline static const char *inputRtspAddress = "rtsp://192.168.0.186:8080/video/h264";&#xA;&#xA;int main()&#xA;{&#xA;    AVFormatContext* formatContext = nullptr;&#xA;&#xA;    AVStream* audioStream = nullptr;&#xA;    AVStream* videoStream = nullptr;&#xA;    AVCodec* audioCodec = nullptr;&#xA;    AVCodec* videoCodec = nullptr;&#xA;    AVCodecContext* audioCodecContext = nullptr;&#xA;    AVCodecContext* videoCodecContext = nullptr;&#xA;    vl::AudioSettings audioSettings;&#xA;    vl::VideoSettings videoSettings;&#xA;&#xA;    int audioIndex = -1;&#xA;    int videoIndex = -1;&#xA;&#xA;    SwsContext* swsContext = nullptr;&#xA;    std::vector frameBuffer;&#xA;    AVFrame* frame = av_frame_alloc();&#xA;    AVFrame* decoderFrame = av_frame_alloc();&#xA;&#xA;    AVPacket packet;&#xA;    cv::Mat mat;&#xA;&#xA;    vl::tools::MultimediaProcessing multimediaProcessing("rtsp://127.0.0.1:8080/stream", vl::configs::rtspStream, 0, vl::enums::EPixelFormat::ABGR);&#xA;&#xA;    // *** OPEN STREAM *** //&#xA;    if(avformat_open_input(&amp;formatContext, inputRtspAddress, nullptr, nullptr) &lt; 0)&#xA;    {&#xA;        std::cout &lt;&lt; "Failed to open input." &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    if(avformat_find_stream_info(formatContext, nullptr) &lt; 0)&#xA;    {&#xA;        std::cout &lt;&lt; "Failed to find stream info." &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    // *** FIND DECODER FOR BOTH AUDIO AND VIDEO STREAM *** //&#xA;    audioCodec = avcodec_find_decoder(AVCodecID::AV_CODEC_ID_AAC);&#xA;    videoCodec = avcodec_find_decoder(AVCodecID::AV_CODEC_ID_H264);&#xA;&#xA;    if(audioCodec == nullptr || videoCodec == nullptr)&#xA;    {&#xA;        std::cout &lt;&lt; "No AUDIO or VIDEO in stream." &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    // *** FIND STREAM FOR BOTH AUDIO AND VIDEO STREAM *** //&#xA;&#xA;    audioIndex = av_find_best_stream(formatContext, AVMEDIA_TYPE_AUDIO, -1, -1, &amp;audioCodec, 0);&#xA;    videoIndex = av_find_best_stream(formatContext, AVMEDIA_TYPE_VIDEO, -1, -1, &amp;videoCodec, 0);&#xA;&#xA;    if(audioIndex &lt; 0 || videoIndex &lt; 0)&#xA;    {&#xA;        std::cout &lt;&lt; "Failed to find AUDIO or VIDEO stream." &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    audioStream = formatContext->streams[audioIndex];&#xA;    videoStream = formatContext->streams[videoIndex];&#xA;&#xA;    // *** ALLOC CODEC CONTEXT FOR BOTH AUDIO AND VIDEO STREAM *** //&#xA;    audioCodecContext = avcodec_alloc_context3(audioCodec);&#xA;    videoCodecContext = avcodec_alloc_context3(videoCodec);&#xA;&#xA;    if(audioCodecContext == nullptr || videoCodecContext == nullptr)&#xA;    {&#xA;        std::cout &lt;&lt; "Can not allocate AUDIO or VIDEO context." &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    if(avcodec_parameters_to_context(audioCodecContext, formatContext->streams[audioIndex]->codecpar) &lt; 0 || avcodec_parameters_to_context(videoCodecContext, formatContext->streams[videoIndex]->codecpar) &lt; 0)&#xA;    {&#xA;        std::cout &lt;&lt; "Can not fill AUDIO or VIDEO codec context." &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    if(avcodec_open2(audioCodecContext, audioCodec, nullptr) &lt; 0 || avcodec_open2(videoCodecContext, videoCodec, nullptr) &lt; 0)&#xA;    {&#xA;        std::cout &lt;&lt; "Failed to open AUDIO codec" &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    // *** INITIALIZE MULTIMEDIA PROCESSING *** //&#xA;    std::vector<unsigned char="char"> extraData(audioStream->codecpar->extradata_size);&#xA;    std::copy_n(audioStream->codecpar->extradata, extraData.size(), extraData.begin());&#xA;&#xA;    audioSettings.sampleRate         = audioStream->codecpar->sample_rate,&#xA;    audioSettings.bitrate            = audioStream->codecpar->bit_rate,&#xA;    audioSettings.codec              = vl::enums::EAudioCodec::AAC,&#xA;    audioSettings.channels           = audioStream->codecpar->channels,&#xA;    audioSettings.bitsPerCodedSample = audioStream->codecpar->bits_per_coded_sample,&#xA;    audioSettings.bitsPerRawSample   = audioStream->codecpar->bits_per_raw_sample,&#xA;    audioSettings.blockAlign         = audioStream->codecpar->block_align,&#xA;    audioSettings.channelLayout      = audioStream->codecpar->channel_layout,&#xA;    audioSettings.format             = audioStream->codecpar->format,&#xA;    audioSettings.frameSize          = audioStream->codecpar->frame_size,&#xA;    audioSettings.codecExtraData     = std::move(extraData);&#xA;&#xA;    videoSettings.width              = 1920;&#xA;    videoSettings.height             = 1080;&#xA;    videoSettings.framerate          = 25;&#xA;    videoSettings.pixelFormat        = vl::enums::EPixelFormat::ARGB;&#xA;    videoSettings.bitrate            = 8000 * 1000;&#xA;    videoSettings.codec              = vl::enums::EVideoCodec::H264;&#xA;&#xA;    multimediaProcessing.initEncoder(videoSettings, audioSettings);&#xA;&#xA;    // *** INITIALIZE SWS CONTEXT *** //&#xA;    swsContext = sws_getCachedContext(nullptr, videoCodecContext->width, videoCodecContext->height, videoCodecContext->pix_fmt, videoCodecContext->width, videoCodecContext->height, AV_PIX_FMT_RGBA, SWS_FAST_BILINEAR, nullptr, nullptr, nullptr);&#xA;&#xA;    if (const auto inReturn = av_image_get_buffer_size(AV_PIX_FMT_RGBA, videoCodecContext->width, videoCodecContext->height, 1); inReturn > 0)&#xA;    {&#xA;        frameBuffer.reserve(inReturn);&#xA;    }&#xA;    else&#xA;    {&#xA;        std::cout &lt;&lt; "Can not get buffer size." &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    if (const auto inReturn = av_image_fill_arrays(frame->data, frame->linesize, frameBuffer.data(), AV_PIX_FMT_RGBA, videoCodecContext->width, videoCodecContext->height, 1); inReturn &lt; 0)&#xA;    {&#xA;        std::cout &lt;&lt; "Can not fill buffer arrays." &lt;&lt; std::endl;&#xA;        return EXIT_FAILURE;&#xA;    }&#xA;&#xA;    // *** MAIN LOOP *** //&#xA;    while(true)&#xA;    {&#xA;        // Return the next frame of a stream.&#xA;        if(av_read_frame(formatContext, &amp;packet) == 0)&#xA;        {&#xA;            if(packet.stream_index == videoIndex) // Check if it is video packet.&#xA;            {&#xA;                // Send packet to decoder.&#xA;                if(avcodec_send_packet(videoCodecContext, &amp;packet) == 0)&#xA;                {&#xA;                    int returnCode = avcodec_receive_frame(videoCodecContext, decoderFrame); // Get Frame from decoder.&#xA;&#xA;                    if (returnCode == 0) // Transform frame and send it to encoder. And re-stream that.&#xA;                    {&#xA;                        sws_scale(swsContext, decoderFrame->data, decoderFrame->linesize, 0, decoderFrame->height, frame->data, frame->linesize);&#xA;&#xA;                        mat = cv::Mat(videoCodecContext->height, videoCodecContext->width, CV_8UC4, frameBuffer.data(), frame->linesize[0]);&#xA;&#xA;                        cv::resize(mat, mat, cv::Size(1920, 1080), cv::INTER_NEAREST);&#xA;&#xA;                        multimediaProcessing.encode(mat.data, packet.dts, packet.dts, packet.flags == AV_PKT_FLAG_KEY); // Thise line sends cv::Mat to encoder and re-streams it.&#xA;&#xA;                        av_packet_unref(&amp;packet);&#xA;                    }&#xA;                    else if(returnCode == AVERROR(EAGAIN))&#xA;                    {&#xA;                        av_frame_unref(decoderFrame);&#xA;                        av_freep(decoderFrame);&#xA;                    }&#xA;                    else&#xA;                    {&#xA;                        av_frame_unref(decoderFrame);&#xA;                        av_freep(decoderFrame);&#xA;&#xA;                        std::cout &lt;&lt; "Error during decoding." &lt;&lt; std::endl;&#xA;                        return EXIT_FAILURE;&#xA;                    }&#xA;                }&#xA;            }&#xA;            else if(packet.stream_index == audioIndex) // Check if it is audio packet.&#xA;            {&#xA;                std::vector vectorPacket(packet.data, packet.data &#x2B; packet.size);&#xA;&#xA;                multimediaProcessing.addAudioPacket(vectorPacket, packet.dts, packet.dts);&#xA;            }&#xA;            else&#xA;            {&#xA;                av_packet_unref(&amp;packet);&#xA;            }&#xA;        }&#xA;        else&#xA;        {&#xA;            std::cout &lt;&lt; "Can not send video packet to decoder." &lt;&lt; std::endl;&#xA;            std::this_thread::sleep_for(std::chrono::seconds(1));&#xA;        }&#xA;    }&#xA;&#xA;    return EXIT_SUCCESS;&#xA;    }&#xA;</unsigned></string></iostream>

    &#xA;

    What does It do ?

    &#xA;

    It takes a single RTSP stream to decode its data so I can, for example, draw something to its frames or whatever, and then stream it under a different address.

    &#xA;

    Basically, I am opening the RTSP stream, check if it does contain both audio and video streams, and find a decoder for them. Then I create an encoder to which I will tell how the output stream should look like and that's it.

    &#xA;

    And this point I will create an endless loop Where I will read all packets coming from the input stream, then decode it does something to it and again encode it and re=stream it.

    &#xA;

    What is the issue ?

    &#xA;

    If you take a closer look I am sending both video and audio frame together with lastly received PTS and DTS contained in AVPacket, to the encoder.

    &#xA;

    The PTS and DTS from the point when I receive the first AVPacket looks for example like this.

    &#xA;

    IN AUDIO STREAM :

    &#xA;

    &#xA;

    -22783, -21759, -20735, -19711, -18687, -17663, -16639, -15615, -14591, -13567, -12543, -11519, -10495, -9471, -8447, -7423, -6399, -5375, -4351, -3327, -2303, -1279, -255, 769, 1793, 2817, 3841, 4865, 5889, 6913, 7937, 8961, 9985, 11009, 12033, 13057, 14081, 15105, 16129, 17153

    &#xA;

    &#xA;

    As you can see it is every time incremented by 1024 and that is a sample rate of the audio stream. Quite clear here.

    &#xA;

    IN VIDEO STREAM :

    &#xA;

    &#xA;

    86400, 90000, 93600, 97200, 100800, 104400, 108000, 111600, 115200, 118800, 122400, 126000, 129600, 133200, 136800, 140400, 144000, 147600, 151200, 154800, 158400, 162000, 165600

    &#xA;

    &#xA;

    As you can see it is every time incremented by 3600 but WHY ?. What this number actually mean ?

    &#xA;

    From what I can understand, those received PTS and DTS are for the following :

    &#xA;

    DTS should tell the encoder when it should start encoding the frame so the frame in time are in the correct order and not mishmashed.

    &#xA;

    PTS should say the correct time when the frame should be played/displayed in the output stream so the frame in time are in the correct order and not mishmashed.

    &#xA;

    What I am trying to achieve ?

    &#xA;

    As I said I need to restream a RTSP stream. I can not use PTS and DTS which comes from received AVPackets, because at some point it can happen that the input stream can randomly close and I need to open it again. The problem is that when I actually do it, then the PTS and DTS start to generate again from the minus values same as you could see in the samples. I CAN NOT send those "new" PTS and DTS to the encoder because they are now lower than the encoder/muxer expects.

    &#xA;

    I need to continually stream something (both audio and video), even it is a blank black screen or silent audio. And each frame the PTS and DTS should rise by a specific number. I need to figure out how the increment is calculated.

    &#xA;

    ----------------------------------

    &#xA;

    The final result should look like a mosaic of multiple input streams in a single output stream. A single input stream (main) has both audio and video and the rest (side) has just video. Some of those streams can randomly close in time and I need to ensure that it will be back again once it is possible.

    &#xA;

  • Ogg objections

    3 mars 2010, par Mans — Multimedia

    The Ogg container format is being promoted by the Xiph Foundation for use with its Vorbis and Theora codecs. Unfortunately, a number of technical shortcomings in the format render it ill-suited to most, if not all, use cases. This article examines the most severe of these flaws.

    Overview of Ogg

    The basic unit in an Ogg stream is the page consisting of a header followed by one or more packets from a single elementary stream. A page can contain up to 255 packets, and a packet can span any number of pages. The following table describes the page header.

    Field Size (bits) Description
    capture_pattern 32 magic number “OggS”
    version 8 always zero
    flags 8
    granule_position 64 abstract timestamp
    bitstream_serial_number 32 elementary stream number
    page_sequence_number 32 incremented by 1 each page
    checksum 32 CRC of entire page
    page_segments 8 length of segment_table
    segment_table variable list of packet sizes

    Elementary stream types are identified by looking at the payload of the first few pages, which contain any setup data required by the decoders. For full details, see the official format specification.

    Generality

    Ogg, legend tells, was designed to be a general-purpose container format. To most multimedia developers, a general-purpose format is one in which encoded data of any type can be encapsulated with a minimum of effort.

    The Ogg format defined by the specification does not fit this description. For every format one wishes to use with Ogg, a complex mapping must first be defined. This mapping defines how to identify a codec, how to extract setup data, and even how timestamps are to be interpreted. All this is done differently for every codec. To correctly parse an Ogg stream, every such mapping ever defined must be known.

    Under this premise, a centralised repository of codec mappings would seem like a sensible idea, but alas, no such thing exists. It is simply impossible to obtain a exhaustive list of defined mappings, which makes the task of creating a complete implementation somewhat daunting.

    One brave soul, Tobias Waldvogel, created a mapping, OGM, capable of storing any Microsoft AVI compatible codec data in Ogg files. This format saw some use in the wild, but was frowned upon by Xiph, and it was eventually displaced by other formats.

    True generality is evidently not to be found with the Ogg format.

    A good example of a general-purpose format is Matroska. This container can trivially accommodate any codec, all it requires is a unique string to identify the codec. For codecs requiring setup data, a standard location for this is provided in the container. Furthermore, an official list of codec identifiers is maintained, meaning all information required to fully support Matroska files is available from one place.

    Matroska also has probably the greatest advantage of all : it is in active, wide-spread use. Historically, standards derived from existing practice have proven more successful than those created by a design committee.

    Overhead

    When designing a container format, one important consideration is that of overhead, i.e. the extra space required in addition to the elementary stream data being combined. For any given container, the overhead can be divided into a fixed part, independent of the total file size, and a variable part growing with increasing file size. The fixed overhead is not of much concern, its relative contribution being negligible for typical file sizes.

    The variable overhead in the Ogg format comes from the page headers, mostly from the segment_table field. This field uses a most peculiar encoding, somewhat reminiscent of Roman numerals. In Roman times, numbers were written as a sequence of symbols, each representing a value, the combined value being the sum of the constituent values.

    The segment_table field lists the sizes of all packets in the page. Each value in the list is coded as a number of bytes equal to 255 followed by a final byte with a smaller value. The packet size is simply the sum of all these bytes. Any strictly additive encoding, such as this, has the distinct drawback of coded length being linearly proportional to the encoded value. A value of 5000, a reasonable packet size for video of moderate bitrate, requires no less than 20 bytes to encode.

    On top of this we have the 27-byte page header which, although paling in comparison to the packet size encoding, is still much larger than necessary. Starting at the top of the list :

    • The version field could be disposed of, a single-bit marker being adequate to separate this first version from hypothetical future versions. One of the unused positions in the flags field could be used for this purpose
    • A 64-bit granule_position is completely overkill. 32 bits would be more than enough for the vast majority of use cases. In extreme cases, a one-bit flag could be used to signal an extended timestamp field.
    • 32-bit elementary stream number ? Are they anticipating files with four billion elementary streams ? An eight-bit field, if not smaller, would seem more appropriate here.
    • The 32-bit page_sequence_number is inexplicable. The intent is to allow detection of page loss due to transmission errors. ISO MPEG-TS uses a 4-bit counter per 188-byte packet for this purpose, and that format is used where packet loss actually happens, unlike any use of Ogg to date.
    • A mandatory 32-bit checksum is nothing but a waste of space when using a reliable storage/transmission medium. Again, a flag could be used to signal the presence of an optional checksum field.

    With the changes suggested above, the page header would shrink from 27 bytes to 12 bytes in size.

    We thus see that in an Ogg file, the packet size fields alone contribute an overhead of 1/255 or approximately 0.4%. This is a hard lower bound on the overhead, not attainable even in theory. In reality the overhead tends to be closer to 1%.

    Contrast this with the ISO MP4 file format, which can easily achieve an overhead of less than 0.05% with a 1 Mbps elementary stream.

    Latency

    In many applications end-to-end latency is an important factor. Examples include video conferencing, telephony, live sports events, interactive gaming, etc. With the codec layer contributing as little as 10 milliseconds of latency, the amount imposed by the container becomes an important factor.

    Latency in an Ogg-based system is introduced at both the sender and the receiver. Since the page header depends on the entire contents of the page (packet sizes and checksum), a full page of packets must be buffered by the sender before a single bit can be transmitted. This sets a lower bound for the sending latency at the duration of a page.

    On the receiving side, playback cannot commence until packets from all elementary streams are available. Hence, with two streams (audio and video) interleaved at the page level, playback is delayed by at least one page duration (two if checksums are verified).

    Taking both send and receive latencies into account, the minimum end-to-end latency for Ogg is thus twice the duration of a page, triple if strict checksum verification is required. If page durations are variable, the maximum value must be used in order to avoid buffer underflows.

    Minimum latency is clearly achieved by minimising the page duration, which in turn implies sending only one packet per page. This is where the size of the page header becomes important. The header for a single-packet page is 27 + packet_size/255 bytes in size. For a 1 Mbps video stream at 25 fps this gives an overhead of approximately 1%. With a typical audio packet size of 400 bytes, the overhead becomes a staggering 7%. The average overhead for a multiplex of these two streams is 1.4%.

    As it stands, the Ogg format is clearly not a good choice for a low-latency application. The key to low latency is small packets and fine-grained interleaving of streams, and although Ogg can provide both of these, by sending a single packet per page, the price in overhead is simply too high.

    ISO MPEG-PS has an overhead of 9 bytes on most packets (a 5-byte timestamp is added a few times per second), and Microsoft’s ASF has a 12-byte packet header. My suggestions for compacting the Ogg page header would bring it in line with these formats.

    Random access

    Any general-purpose container format needs to allow random access for direct seeking to any given position in the file. Despite this goal being explicitly mentioned in the Ogg specification, the format only allows the most crude of random access methods.

    While many container formats include an index allowing a time to be directly translated into an offset into the file, Ogg has nothing of this kind, the stated rationale for the omission being that this would require a two-pass multiplexing, the second pass creating the index. This is obviously not true ; the index could simply be written at the end of the file. Those objecting that this index would be unavailable in a streaming scenario are forgetting that seeking is impossible there regardless.

    The method for seeking suggested by the Ogg documentation is to perform a binary search on the file, after each file-level seek operation scanning for a page header, extracting the timestamp, and comparing it to the desired position. When the elementary stream encoding allows only certain packets as random access points (video key frames), a second search will have to be performed to locate the entry point closest to the desired time. In a large file (sizes upwards of 10 GB are common), 50 seeks might be required to find the correct position.

    A typical hard drive has an average seek time of roughly 10 ms, giving a total time for the seek operation of around 500 ms, an annoyingly long time. On a slow medium, such as an optical disc or files served over a network, the times are orders of magnitude longer.

    A factor further complicating the seeking process is the possibility of header emulation within the elementary stream data. To safeguard against this, one has to read the entire page and verify the checksum. If the storage medium cannot provide data much faster than during normal playback, this provides yet another substantial delay towards finishing the seeking operation. This too applies to both network delivery and optical discs.

    Although optical disc usage is perhaps in decline today, one should bear in mind that the Ogg format was designed at a time when CDs and DVDs were rapidly gaining ground, and network-based storage is most certainly on the rise.

    The final nail in the coffin of seeking is the codec-dependent timestamp format. At each step in the seeking process, the timestamp parsing specified by the codec mapping corresponding the current page must be invoked. If the mapping is not known, the best one can do is skip pages until one with a known mapping is found. This delays the seeking and complicates the implementation, both bad things.

    Timestamps

    A problem old as multimedia itself is that of synchronising multiple elementary streams (e.g. audio and video) during playback ; badly synchronised A/V is highly unpleasant to view. By the time Ogg was invented, solutions to this problem were long since explored and well-understood. The key to proper synchronisation lies in tagging elementary stream packets with timestamps, packets carrying the same timestamp intended for simultaneous presentation. The concept is as simple as it seems, so it is astonishing to see the amount of complexity with which the Ogg designers managed to imbue it. So bizarre is it, that I have devoted an entire article to the topic, and will not cover it further here.

    Complexity

    Video and audio decoding are time-consuming tasks, so containers should be designed to minimise extra processing required. With the data volumes involved, even an act as simple as copying a packet of compressed data can have a significant impact. Once again, however, Ogg lets us down. Despite the brevity of the specification, the format is remarkably complicated to parse properly.

    The unusual and inefficient encoding of the packet sizes limits the page size to somewhat less than 64 kB. To still allow individual packets larger than this limit, it was decided to allow packets spanning multiple pages, a decision with unfortunate implications. A page-spanning packet as it arrives in the Ogg stream will be discontiguous in memory, a situation most decoders are unable to handle, and reassembly, i.e. copying, is required.

    The knowledgeable reader may at this point remark that the MPEG-TS format also splits packets into pieces requiring reassembly before decoding. There is, however, a significant difference there. MPEG-TS was designed for hardware demultiplexing feeding directly into hardware decoders. In such an implementation the fragmentation is not a problem. Rather, the fine-grained interleaving is a feature allowing smaller on-chip buffers.

    Buffering is also an area in which Ogg suffers. To keep the overhead down, pages must be made as large as practically possible, and page size translates directly into demultiplexer buffer size. Playback of a file with two elementary streams thus requires 128 kB of buffer space. On a modern PC this is perhaps nothing to be concerned about, but in a small embedded system, e.g. a portable media player, it can be relevant.

    In addition to the above, a number of other issues, some of them minor, others more severe, make Ogg processing a painful experience. A selection follows :

    • 32-bit random elementary stream identifiers mean a simple table-lookup cannot be used. Instead the list of streams must be searched for a match. While trivial to do in software, it is still annoying, and a hardware demultiplexer would be significantly more complicated than with a smaller identifier.
    • Semantically ambiguous streams are possible. For example, the continuation flag (bit 1) may conflict with continuation (or lack thereof) implied by the segment table on the preceding page. Such invalid files have been spotted in the wild.
    • Concatenating independent Ogg streams forms a valid stream. While finding a use case for this strange feature is difficult, an implementation must of course be prepared to encounter such streams. Detecting and dealing with these adds pointless complexity.
    • Unusual terminology : inventing new terms for well-known concepts is confusing for the developer trying to understand the format in relation to others. A few examples :
      Ogg name Usual name
      logical bitstream elementary stream
      grouping multiplexing
      lacing value packet size (approximately)
      segment imaginary element serving no real purpose
      granule position timestamp

    Final words

    We have found the Ogg format to be a dubious choice in just about every situation. Why then do certain organisations and individuals persist in promoting it with such ferocity ?

    When challenged, three types of reaction are characteristic of the Ogg campaigners.

    On occasion, these people will assume an apologetic tone, explaining how Ogg was only ever designed for simple audio-only streams (ignoring it is as bad for these as for anything), and this is no doubt true. Why then, I ask again, do they continue to tout Ogg as the one-size-fits-all solution they already admitted it is not ?

    More commonly, the Ogg proponents will respond with hand-waving arguments best summarised as Ogg isn’t bad, it’s just different. My reply to this assertion is twofold :

    • Being too different is bad. We live in a world where multimedia files come in many varieties, and a decent media player will need to handle the majority of them. Fortunately, most multimedia file formats share some basic traits, and they can easily be processed in the same general framework, the specifics being taken care of at the input stage. A format deviating too far from the standard model becomes problematic.
    • Ogg is bad. When every angle of examination reveals serious flaws, bad is the only fitting description.

    The third reaction bypasses all technical analysis : Ogg is patent-free, a claim I am not qualified to directly discuss. Assuming it is true, it still does not alter the fact that Ogg is a bad format. Being free from patents does not magically make Ogg a good choice as file format. If all the standard formats are indeed covered by patents, the only proper solution is to design a new, good format which is not, this time hopefully avoiding the old mistakes.

  • How to reparse video with stable "overall bit rate" ? (FFmpeg)

    20 février 2018, par user3360601

    I have such code below :

    #include
    #include
    #include

    extern "C"
    {
    #include <libavcodec></libavcodec>avcodec.h>
    #include <libavformat></libavformat>avformat.h>
    #include <libavfilter></libavfilter>buffersink.h>
    #include <libavfilter></libavfilter>buffersrc.h>
    #include <libavutil></libavutil>opt.h>
    #include <libavutil></libavutil>pixdesc.h>
    }

    static AVFormatContext *ifmt_ctx;
    static AVFormatContext *ofmt_ctx;

    typedef struct StreamContext {
       AVCodecContext *dec_ctx;
       AVCodecContext *enc_ctx;
    } StreamContext;
    static StreamContext *stream_ctx;

    static int open_input_file(const char *filename)
    {
       int ret;
       unsigned int i;

       ifmt_ctx = NULL;
       if ((ret = avformat_open_input(&amp;ifmt_ctx, filename, NULL, NULL)) &lt; 0) {
           av_log(NULL, AV_LOG_ERROR, "Cannot open input file\n");
           return ret;
       }

       if ((ret = avformat_find_stream_info(ifmt_ctx, NULL)) &lt; 0) {
           av_log(NULL, AV_LOG_ERROR, "Cannot find stream information\n");
           return ret;
       }

       stream_ctx = (StreamContext *) av_mallocz_array(ifmt_ctx->nb_streams, sizeof(*stream_ctx));
       if (!stream_ctx)
           return AVERROR(ENOMEM);

       for (i = 0; i &lt; ifmt_ctx->nb_streams; i++) {
           AVStream *stream = ifmt_ctx->streams[i];
           AVCodec *dec = avcodec_find_decoder(stream->codecpar->codec_id);
           AVCodecContext *codec_ctx;
           if (!dec) {
               av_log(NULL, AV_LOG_ERROR, "Failed to find decoder for stream #%u\n", i);
               return AVERROR_DECODER_NOT_FOUND;
           }
           codec_ctx = avcodec_alloc_context3(dec);
           if (!codec_ctx) {
               av_log(NULL, AV_LOG_ERROR, "Failed to allocate the decoder context for stream #%u\n", i);
               return AVERROR(ENOMEM);
           }
           ret = avcodec_parameters_to_context(codec_ctx, stream->codecpar);
           if (ret &lt; 0) {
               av_log(NULL, AV_LOG_ERROR, "Failed to copy decoder parameters to input decoder context "
                   "for stream #%u\n", i);
               return ret;
           }
           /* Reencode video &amp; audio and remux subtitles etc. */
           if (codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO
               || codec_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
               if (codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO)
                   codec_ctx->framerate = av_guess_frame_rate(ifmt_ctx, stream, NULL);
               /* Open decoder */
               ret = avcodec_open2(codec_ctx, dec, NULL);
               if (ret &lt; 0) {
                   av_log(NULL, AV_LOG_ERROR, "Failed to open decoder for stream #%u\n", i);
                   return ret;
               }
           }
           stream_ctx[i].dec_ctx = codec_ctx;
       }

       av_dump_format(ifmt_ctx, 0, filename, 0);
       return 0;
    }

    static int open_output_file(const char *filename)
    {
       AVStream *out_stream;
       AVStream *in_stream;
       AVCodecContext *dec_ctx, *enc_ctx;
       AVCodec *encoder;
       int ret;
       unsigned int i;

       ofmt_ctx = NULL;
       avformat_alloc_output_context2(&amp;ofmt_ctx, NULL, NULL, filename);
       if (!ofmt_ctx) {
           av_log(NULL, AV_LOG_ERROR, "Could not create output context\n");
           return AVERROR_UNKNOWN;
       }


       for (i = 0; i &lt; ifmt_ctx->nb_streams; i++) {
           out_stream = avformat_new_stream(ofmt_ctx, NULL);
           if (!out_stream) {
               av_log(NULL, AV_LOG_ERROR, "Failed allocating output stream\n");
               return AVERROR_UNKNOWN;
           }

           in_stream = ifmt_ctx->streams[i];
           dec_ctx = stream_ctx[i].dec_ctx;

           //ofmt_ctx->bit_rate = ifmt_ctx->bit_rate;
           ofmt_ctx->duration = ifmt_ctx->duration;

           if (dec_ctx->codec_type == AVMEDIA_TYPE_VIDEO
               || dec_ctx->codec_type == AVMEDIA_TYPE_AUDIO) {
               /* in this example, we choose transcoding to same codec */
               encoder = avcodec_find_encoder(dec_ctx->codec_id);
               if (!encoder) {
                   av_log(NULL, AV_LOG_FATAL, "Necessary encoder not found\n");
                   return AVERROR_INVALIDDATA;
               }
               enc_ctx = avcodec_alloc_context3(encoder);
               if (!enc_ctx) {
                   av_log(NULL, AV_LOG_FATAL, "Failed to allocate the encoder context\n");
                   return AVERROR(ENOMEM);
               }

               /* In this example, we transcode to same properties (picture size,
               * sample rate etc.). These properties can be changed for output
               * streams easily using filters */
               if (dec_ctx->codec_type == AVMEDIA_TYPE_VIDEO) {
                   enc_ctx->gop_size = dec_ctx->gop_size;
                   enc_ctx->bit_rate = dec_ctx->bit_rate;
                   enc_ctx->height = dec_ctx->height;
                   enc_ctx->width = dec_ctx->width;
                   enc_ctx->sample_aspect_ratio = dec_ctx->sample_aspect_ratio;
                   /* take first format from list of supported formats */
                   if (encoder->pix_fmts)
                       enc_ctx->pix_fmt = encoder->pix_fmts[0];
                   else
                       enc_ctx->pix_fmt = dec_ctx->pix_fmt;
                   /* video time_base can be set to whatever is handy and supported by encoder */
                   enc_ctx->time_base = av_inv_q(dec_ctx->framerate);
                   enc_ctx->framerate = av_guess_frame_rate(ifmt_ctx, in_stream, NULL);
               }
               else {
                   enc_ctx->gop_size = dec_ctx->gop_size;
                   enc_ctx->bit_rate = dec_ctx->bit_rate;
                   enc_ctx->sample_rate = dec_ctx->sample_rate;
                   enc_ctx->channel_layout = dec_ctx->channel_layout;
                   enc_ctx->channels = av_get_channel_layout_nb_channels(enc_ctx->channel_layout);
                   /* take first format from list of supported formats */
                   enc_ctx->sample_fmt = encoder->sample_fmts[0];
                   //enc_ctx->time_base = (AVRational){ 1, enc_ctx->sample_rate };
                   enc_ctx->time_base.num = 1;
                   enc_ctx->time_base.den = enc_ctx->sample_rate;

                   enc_ctx->framerate = av_guess_frame_rate(ifmt_ctx, in_stream, NULL);
               }

               /* Third parameter can be used to pass settings to encoder */
               ret = avcodec_open2(enc_ctx, encoder, NULL);
               if (ret &lt; 0) {
                   av_log(NULL, AV_LOG_ERROR, "Cannot open video encoder for stream #%u\n", i);
                   return ret;
               }
               ret = avcodec_parameters_from_context(out_stream->codecpar, enc_ctx);
               if (ret &lt; 0) {
                   av_log(NULL, AV_LOG_ERROR, "Failed to copy encoder parameters to output stream #%u\n", i);
                   return ret;
               }
               if (ofmt_ctx->oformat->flags &amp; AVFMT_GLOBALHEADER)
                   enc_ctx->flags |= AV_CODEC_FLAG_GLOBAL_HEADER;

               out_stream->time_base = enc_ctx->time_base;
               stream_ctx[i].enc_ctx = enc_ctx;
           }
           else if (dec_ctx->codec_type == AVMEDIA_TYPE_UNKNOWN) {
               av_log(NULL, AV_LOG_FATAL, "Elementary stream #%d is of unknown type, cannot proceed\n", i);
               return AVERROR_INVALIDDATA;
           }
           else {
               /* if this stream must be remuxed */
               ret = avcodec_parameters_copy(out_stream->codecpar, in_stream->codecpar);
               if (ret &lt; 0) {
                   av_log(NULL, AV_LOG_ERROR, "Copying parameters for stream #%u failed\n", i);
                   return ret;
               }
               out_stream->time_base = in_stream->time_base;
           }

       }
       av_dump_format(ofmt_ctx, 0, filename, 1);

       if (!(ofmt_ctx->oformat->flags &amp; AVFMT_NOFILE)) {
           ret = avio_open(&amp;ofmt_ctx->pb, filename, AVIO_FLAG_WRITE);
           if (ret &lt; 0) {
               av_log(NULL, AV_LOG_ERROR, "Could not open output file '%s'", filename);
               return ret;
           }
       }

       /* init muxer, write output file header */
       ret = avformat_write_header(ofmt_ctx, NULL);
       if (ret &lt; 0) {
           av_log(NULL, AV_LOG_ERROR, "Error occurred when opening output file\n");
           return ret;
       }

       return 0;
    }

    int main(int argc, char **argv)
    {
       int ret;
       AVPacket packet = {0};
       packet.data = NULL;
       packet.size = 0 ;
       AVFrame *frame = NULL;
       enum AVMediaType type;
       unsigned int stream_index;
       unsigned int i;
       int got_frame;
       int(*dec_func)(AVCodecContext *, AVFrame *, int *, const AVPacket *);

       if (argc != 3) {
           av_log(NULL, AV_LOG_ERROR, "Usage: %s <input file="file" /> <output file="file">\n", argv[0]);
           return 1;
       }

       av_register_all();
       avfilter_register_all();

       if ((ret = open_input_file(argv[1])) &lt; 0)
           goto end;
       if ((ret = open_output_file(argv[2])) &lt; 0)
           goto end;

       /* read all packets */
       while (1) {
           if ((ret = av_read_frame(ifmt_ctx, &amp;packet)) &lt; 0)
               break;
           stream_index = packet.stream_index;
           type = ifmt_ctx->streams[packet.stream_index]->codecpar->codec_type;
           av_log(NULL, AV_LOG_DEBUG, "Demuxer gave frame of stream_index %u\n", stream_index);

           /* remux this frame without reencoding */
           av_packet_rescale_ts(&amp;packet,
               ifmt_ctx->streams[stream_index]->time_base,
               ofmt_ctx->streams[stream_index]->time_base);

           ret = av_interleaved_write_frame(ofmt_ctx, &amp;packet);
           if (ret &lt; 0)
               goto end;

           av_packet_unref(&amp;packet);
       }

       av_write_trailer(ofmt_ctx);
    end:
       av_packet_unref(&amp;packet);
       av_frame_free(&amp;frame);
       for (i = 0; i &lt; ifmt_ctx->nb_streams; i++) {
           avcodec_free_context(&amp;stream_ctx[i].dec_ctx);
           if (ofmt_ctx &amp;&amp; ofmt_ctx->nb_streams > i &amp;&amp; ofmt_ctx->streams[i] &amp;&amp; stream_ctx[i].enc_ctx)
               avcodec_free_context(&amp;stream_ctx[i].enc_ctx);
       }
       av_free(stream_ctx);
       avformat_close_input(&amp;ifmt_ctx);
       if (ofmt_ctx &amp;&amp; !(ofmt_ctx->oformat->flags &amp; AVFMT_NOFILE))
           avio_closep(&amp;ofmt_ctx->pb);
       avformat_free_context(ofmt_ctx);

       return ret ? 1 : 0;
    }
    </output>

    This is a little bit changed code from official example of using ffmpeg called transcoding.c

    I only read packets from one stream and write them to another stream. It works fine.

    Proof below.
    enter image description here

    then I add to main a condition. If it is a packet with video frame, I will decode it, then encode and write to another stream. No other actions with frame.

    My addition code below :

    static int encode_write_frame(AVFrame *filt_frame, unsigned int stream_index, int *got_frame) {
       int ret;
       int got_frame_local;
       AVPacket enc_pkt;
       int(*enc_func)(AVCodecContext *, AVPacket *, const AVFrame *, int *) = avcodec_encode_video2 ;

       if (!got_frame)
           got_frame = &amp;got_frame_local;

       av_log(NULL, AV_LOG_INFO, "Encoding frame\n");
       /* encode filtered frame */
       enc_pkt.data = NULL;
       enc_pkt.size = 0;
       av_init_packet(&amp;enc_pkt);
       ret = enc_func(stream_ctx[stream_index].enc_ctx, &amp;enc_pkt,
           filt_frame, got_frame);
       if (ret &lt; 0)
           return ret;
       if (!(*got_frame))
           return 0;

       /* prepare packet for muxing */
       enc_pkt.stream_index = stream_index;
       av_packet_rescale_ts(&amp;enc_pkt,
           stream_ctx[stream_index].enc_ctx->time_base,
           ofmt_ctx->streams[stream_index]->time_base);

       av_log(NULL, AV_LOG_DEBUG, "Muxing frame\n");
       /* mux encoded frame */
       ret = av_interleaved_write_frame(ofmt_ctx, &amp;enc_pkt);
       return ret;
    }

    static int filter_encode_write_frame(AVFrame *frame, unsigned int stream_index)
    {
       int ret;
       av_log(NULL, AV_LOG_INFO, "Pushing decoded frame to filters\n");

       while (1) {
           av_log(NULL, AV_LOG_INFO, "Pulling filtered frame from filters\n");

           ret = encode_write_frame(frame, stream_index, NULL);
           if (ret &lt; 0)
               break;
           break;
       }

       return ret;
    }


    int main(int argc, char **argv)
    {
       int ret;
       AVPacket packet = {0};
       packet.data = NULL;
       packet.size = 0 ;
       AVFrame *frame = NULL;
       enum AVMediaType type;
       unsigned int stream_index;
       unsigned int i;
       int got_frame;
       int(*dec_func)(AVCodecContext *, AVFrame *, int *, const AVPacket *);

       if (argc != 3) {
           av_log(NULL, AV_LOG_ERROR, "Usage: %s <input file="file" /> <output file="file">\n", argv[0]);
           return 1;
       }

       av_register_all();
       avfilter_register_all();

       if ((ret = open_input_file(argv[1])) &lt; 0)
           goto end;
       if ((ret = open_output_file(argv[2])) &lt; 0)
           goto end;

       /* read all packets */
       while (1) {
           if ((ret = av_read_frame(ifmt_ctx, &amp;packet)) &lt; 0)
               break;
           stream_index = packet.stream_index;
           type = ifmt_ctx->streams[packet.stream_index]->codecpar->codec_type;
           av_log(NULL, AV_LOG_DEBUG, "Demuxer gave frame of stream_index %u\n",
               stream_index);

            if (type == AVMEDIA_TYPE_VIDEO)
            {
                av_log(NULL, AV_LOG_DEBUG, "Going to reencode&amp;filter the frame\n");
                frame = av_frame_alloc();
                if (!frame) {
                    ret = AVERROR(ENOMEM);
                    break;
                }
                av_packet_rescale_ts(&amp;packet,
                    ifmt_ctx->streams[stream_index]->time_base,
                    stream_ctx[stream_index].dec_ctx->time_base);
                dec_func = avcodec_decode_video2;
                ret = dec_func(stream_ctx[stream_index].dec_ctx, frame,
                    &amp;got_frame, &amp;packet);
                if (ret &lt; 0) {
                    av_frame_free(&amp;frame);
                    av_log(NULL, AV_LOG_ERROR, "Decoding failed\n");
                    break;
                }

                if (got_frame) {
                    frame->pts = frame->best_effort_timestamp;
                    ret = filter_encode_write_frame(frame, stream_index);
                    av_frame_free(&amp;frame);
                    if (ret &lt; 0)
                        goto end;
                }
                else {
                    av_frame_free(&amp;frame);
                }
            }
            else
           {
               /* remux this frame without reencoding */
               av_packet_rescale_ts(&amp;packet,
                   ifmt_ctx->streams[stream_index]->time_base,
                   ofmt_ctx->streams[stream_index]->time_base);

               ret = av_interleaved_write_frame(ofmt_ctx, &amp;packet);
               if (ret &lt; 0)
                   goto end;
           }
           av_packet_unref(&amp;packet);
       }

       av_write_trailer(ofmt_ctx);
    end:
       av_packet_unref(&amp;packet);
       av_frame_free(&amp;frame);
       for (i = 0; i &lt; ifmt_ctx->nb_streams; i++) {
           avcodec_free_context(&amp;stream_ctx[i].dec_ctx);
           if (ofmt_ctx &amp;&amp; ofmt_ctx->nb_streams > i &amp;&amp; ofmt_ctx->streams[i] &amp;&amp; stream_ctx[i].enc_ctx)
               avcodec_free_context(&amp;stream_ctx[i].enc_ctx);
       }
       av_free(stream_ctx);
       avformat_close_input(&amp;ifmt_ctx);
       if (ofmt_ctx &amp;&amp; !(ofmt_ctx->oformat->flags &amp; AVFMT_NOFILE))
           avio_closep(&amp;ofmt_ctx->pb);
       avformat_free_context(ofmt_ctx);

       return ret ? 1 : 0;
    }
    </output>

    And the result is different.

    For a test I took a SampleVideo_1280x720_1mb.flv.
    It has

    File size : 1.00 MiB
    Overall bit rate : 1 630 kb/s

    After my decode/encode actions the result became :

    File size : 1.23 MiB
    Overall bit rate : 2 005 kb/s

    Other parameters (video bit rate, audio bit rate, etc) are the same.
    enter image description here

    What am I doing wrong ? How to control overall bit rate ? I suppose, something wrong with encoder/decoder, but what ?

    UPD :
    I get that when in function open_input_file I write

    if (codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO)
    {
        codec_ctx->framerate = av_guess_frame_rate(ifmt_ctx, stream, NULL);
    }

    I get what I get (bigger size and bit rate).

    And when in this function I write

    if (codec_ctx->codec_type == AVMEDIA_TYPE_VIDEO)
    {
       codec_ctx = ifmt_ctx->streams[i]->codec;
    }

    I get smaller size and bit rate.

    File size : 900 KiB
    Overall bit rate : 1 429 kb/s

    But how to get the exactly size and frame rate as in the original file ?