Recherche avancée

Médias (3)

Mot : - Tags -/image

Autres articles (23)

  • La file d’attente de SPIPmotion

    28 novembre 2010, par

    Une file d’attente stockée dans la base de donnée
    Lors de son installation, SPIPmotion crée une nouvelle table dans la base de donnée intitulée spip_spipmotion_attentes.
    Cette nouvelle table est constituée des champs suivants : id_spipmotion_attente, l’identifiant numérique unique de la tâche à traiter ; id_document, l’identifiant numérique du document original à encoder ; id_objet l’identifiant unique de l’objet auquel le document encodé devra être attaché automatiquement ; objet, le type d’objet auquel (...)

  • Les formats acceptés

    28 janvier 2010, par

    Les commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
    ffmpeg -codecs ffmpeg -formats
    Les format videos acceptés en entrée
    Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
    Les formats vidéos de sortie possibles
    Dans un premier temps on (...)

  • Contribute to documentation

    13 avril 2011

    Documentation is vital to the development of improved technical capabilities.
    MediaSPIP welcomes documentation by users as well as developers - including : critique of existing features and functions articles contributed by developers, administrators, content producers and editors screenshots to illustrate the above translations of existing documentation into other languages
    To contribute, register to the project users’ mailing (...)

Sur d’autres sites (2786)

  • C++ ffmpeg lib version 7.0 - export audio to different format

    4 septembre 2024, par Chris P

    I want to make a C++ lib named cppdub which will mimic the python module pydub.

    


    One main function is to export the AudioSegment to a file with a specific format (example : mp3).

    


    The code is :

    


    &#xA;AudioSegment AudioSegment::from_file(const std::string&amp; file_path, const std::string&amp; format, const std::string&amp; codec,&#xA;    const std::map&amp; parameters, int start_second, int duration) {&#xA;&#xA;    avformat_network_init();&#xA;    av_log_set_level(AV_LOG_ERROR); // Adjust logging level as needed&#xA;&#xA;    AVFormatContext* format_ctx = nullptr;&#xA;    if (avformat_open_input(&amp;format_ctx, file_path.c_str(), nullptr, nullptr) != 0) {&#xA;        std::cerr &lt;&lt; "Error: Could not open audio file." &lt;&lt; std::endl;&#xA;        return AudioSegment();  // Return an empty AudioSegment on failure&#xA;    }&#xA;&#xA;    if (avformat_find_stream_info(format_ctx, nullptr) &lt; 0) {&#xA;        std::cerr &lt;&lt; "Error: Could not find stream information." &lt;&lt; std::endl;&#xA;        avformat_close_input(&amp;format_ctx);&#xA;        return AudioSegment();&#xA;    }&#xA;&#xA;    int audio_stream_index = -1;&#xA;    for (unsigned int i = 0; i &lt; format_ctx->nb_streams; i&#x2B;&#x2B;) {&#xA;        if (format_ctx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {&#xA;            audio_stream_index = i;&#xA;            break;&#xA;        }&#xA;    }&#xA;&#xA;    if (audio_stream_index == -1) {&#xA;        std::cerr &lt;&lt; "Error: Could not find audio stream." &lt;&lt; std::endl;&#xA;        avformat_close_input(&amp;format_ctx);&#xA;        return AudioSegment();&#xA;    }&#xA;&#xA;    AVCodecParameters* codec_par = format_ctx->streams[audio_stream_index]->codecpar;&#xA;    const AVCodec* my_codec = avcodec_find_decoder(codec_par->codec_id);&#xA;    AVCodecContext* codec_ctx = avcodec_alloc_context3(my_codec);&#xA;&#xA;    if (!codec_ctx) {&#xA;        std::cerr &lt;&lt; "Error: Could not allocate codec context." &lt;&lt; std::endl;&#xA;        avformat_close_input(&amp;format_ctx);&#xA;        return AudioSegment();&#xA;    }&#xA;&#xA;    if (avcodec_parameters_to_context(codec_ctx, codec_par) &lt; 0) {&#xA;        std::cerr &lt;&lt; "Error: Could not initialize codec context." &lt;&lt; std::endl;&#xA;        avcodec_free_context(&amp;codec_ctx);&#xA;        avformat_close_input(&amp;format_ctx);&#xA;        return AudioSegment();&#xA;    }&#xA;&#xA;    if (avcodec_open2(codec_ctx, my_codec, nullptr) &lt; 0) {&#xA;        std::cerr &lt;&lt; "Error: Could not open codec." &lt;&lt; std::endl;&#xA;        avcodec_free_context(&amp;codec_ctx);&#xA;        avformat_close_input(&amp;format_ctx);&#xA;        return AudioSegment();&#xA;    }&#xA;&#xA;    SwrContext* swr_ctx = swr_alloc();&#xA;    if (!swr_ctx) {&#xA;        std::cerr &lt;&lt; "Error: Could not allocate SwrContext." &lt;&lt; std::endl;&#xA;        avcodec_free_context(&amp;codec_ctx);&#xA;        avformat_close_input(&amp;format_ctx);&#xA;        return AudioSegment();&#xA;    }&#xA;    codec_ctx->sample_rate = 44100;&#xA;    // Set up resampling context to convert to S16 format with 2 bytes per sample&#xA;    av_opt_set_chlayout(swr_ctx, "in_chlayout", &amp;codec_ctx->ch_layout, 0);&#xA;    av_opt_set_int(swr_ctx, "in_sample_rate", codec_ctx->sample_rate, 0);&#xA;    av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", codec_ctx->sample_fmt, 0);&#xA;&#xA;    AVChannelLayout dst_ch_layout;&#xA;    av_channel_layout_copy(&amp;dst_ch_layout, &amp;codec_ctx->ch_layout);&#xA;    av_channel_layout_uninit(&amp;dst_ch_layout);&#xA;    av_channel_layout_default(&amp;dst_ch_layout, 2);&#xA;&#xA;    av_opt_set_chlayout(swr_ctx, "out_chlayout", &amp;dst_ch_layout, 0);&#xA;    av_opt_set_int(swr_ctx, "out_sample_rate", codec_ctx->sample_rate, 0);  // Match input sample rate&#xA;    av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", AV_SAMPLE_FMT_S16, 0);  // Force S16 format&#xA;&#xA;    if (swr_init(swr_ctx) &lt; 0) {&#xA;        std::cerr &lt;&lt; "Error: Failed to initialize the resampling context" &lt;&lt; std::endl;&#xA;        swr_free(&amp;swr_ctx);&#xA;        avcodec_free_context(&amp;codec_ctx);&#xA;        avformat_close_input(&amp;format_ctx);&#xA;        return AudioSegment();&#xA;    }&#xA;&#xA;    AVPacket packet;&#xA;    AVFrame* frame = av_frame_alloc();&#xA;    if (!frame) {&#xA;        std::cerr &lt;&lt; "Error: Could not allocate frame." &lt;&lt; std::endl;&#xA;        swr_free(&amp;swr_ctx);&#xA;        avcodec_free_context(&amp;codec_ctx);&#xA;        avformat_close_input(&amp;format_ctx);&#xA;        return AudioSegment();&#xA;    }&#xA;&#xA;    std::vector<char> output;&#xA;    while (av_read_frame(format_ctx, &amp;packet) >= 0) {&#xA;        if (packet.stream_index == audio_stream_index) {&#xA;            if (avcodec_send_packet(codec_ctx, &amp;packet) == 0) {&#xA;                while (avcodec_receive_frame(codec_ctx, frame) == 0) {&#xA;                    if (frame->pts != AV_NOPTS_VALUE) {&#xA;                        frame->pts = av_rescale_q(frame->pts, codec_ctx->time_base, format_ctx->streams[audio_stream_index]->time_base);&#xA;                    }&#xA;&#xA;                    uint8_t* output_buffer;&#xA;                    int output_samples = av_rescale_rnd(&#xA;                        swr_get_delay(swr_ctx, codec_ctx->sample_rate) &#x2B; frame->nb_samples,&#xA;                        codec_ctx->sample_rate, codec_ctx->sample_rate, AV_ROUND_UP);&#xA;&#xA;                    int output_buffer_size = av_samples_get_buffer_size(&#xA;                        nullptr, 2, output_samples, AV_SAMPLE_FMT_S16, 1);&#xA;&#xA;                    output_buffer = (uint8_t*)av_malloc(output_buffer_size);&#xA;&#xA;                    if (output_buffer) {&#xA;                        memset(output_buffer, 0, output_buffer_size); // Zero padding to avoid random noise&#xA;                        int converted_samples = swr_convert(swr_ctx, &amp;output_buffer, output_samples,&#xA;                            (const uint8_t**)frame->extended_data, frame->nb_samples);&#xA;&#xA;                        if (converted_samples >= 0) {&#xA;                            output.insert(output.end(), output_buffer, output_buffer &#x2B; output_buffer_size);&#xA;                        }&#xA;                        else {&#xA;                            std::cerr &lt;&lt; "Error: Failed to convert audio samples." &lt;&lt; std::endl;&#xA;                        }&#xA;                        // Make sure output_buffer is valid before freeing&#xA;                        if (output_buffer != nullptr) {&#xA;                            av_free(output_buffer);&#xA;                            output_buffer = nullptr; // Prevent double-free&#xA;                        }&#xA;                    }&#xA;                    else {&#xA;                        std::cerr &lt;&lt; "Error: Could not allocate output buffer." &lt;&lt; std::endl;&#xA;                    }&#xA;                }&#xA;            }&#xA;            else {&#xA;                std::cerr &lt;&lt; "Error: Failed to send packet to codec context." &lt;&lt; std::endl;&#xA;            }&#xA;        }&#xA;        av_packet_unref(&amp;packet);&#xA;    }&#xA;&#xA;    int frame_width = av_get_bytes_per_sample(AV_SAMPLE_FMT_S16) * 2;  // Use 2 bytes per sample and 2 channels&#xA;&#xA;    std::map metadata = {&#xA;        {"sample_width", 2},  // S16 format has 2 bytes per sample&#xA;        {"frame_rate", codec_ctx->sample_rate},  // Use the input sample rate&#xA;        {"channels", 2},  // Assuming stereo output&#xA;        {"frame_width", frame_width}&#xA;    };&#xA;&#xA;    av_frame_free(&amp;frame);&#xA;    swr_free(&amp;swr_ctx);&#xA;    avcodec_free_context(&amp;codec_ctx);&#xA;    avformat_close_input(&amp;format_ctx);&#xA;&#xA;    return AudioSegment(static_cast<const>(output.data()), output.size(), metadata);&#xA;}&#xA;&#xA;&#xA;&#xA;&#xA;&#xA;&#xA;&#xA;&#xA;&#xA;std::ofstream AudioSegment::export_segment(const std::string&amp; out_f,&#xA;    const std::string&amp; format,&#xA;    const std::string&amp; codec,&#xA;    const std::string&amp; bitrate,&#xA;    const std::vector&amp; parameters,&#xA;    const std::map&amp; tags,&#xA;    const std::string&amp; id3v2_version,&#xA;    const std::string&amp; cover) {&#xA;&#xA;    av_log_set_level(AV_LOG_DEBUG);&#xA;    AVCodecContext* codec_ctx = nullptr;&#xA;    AVFormatContext* format_ctx = nullptr;&#xA;    AVStream* stream = nullptr;&#xA;    AVFrame* frame = nullptr;&#xA;    AVPacket* pkt = nullptr;&#xA;    SwrContext* swr_ctx = nullptr;&#xA;    int ret;&#xA;&#xA;    // Initialize format context&#xA;    if (avformat_alloc_output_context2(&amp;format_ctx, nullptr, format.c_str(), out_f.c_str()) &lt; 0) {&#xA;        throw std::runtime_error("Could not allocate format context.");&#xA;    }&#xA;&#xA;    // Find encoder&#xA;    const AVCodec* codec_ptr = avcodec_find_encoder_by_name(codec.c_str());&#xA;    if (!codec_ptr) {&#xA;        throw std::runtime_error("Codec not found.");&#xA;    }&#xA;&#xA;    // Add stream&#xA;    stream = avformat_new_stream(format_ctx, codec_ptr);&#xA;    if (!stream) {&#xA;        throw std::runtime_error("Failed to create new stream.");&#xA;    }&#xA;&#xA;    // Allocate codec context&#xA;    codec_ctx = avcodec_alloc_context3(codec_ptr);&#xA;    if (!codec_ctx) {&#xA;        throw std::runtime_error("Could not allocate audio codec context.");&#xA;    }&#xA;&#xA;    // Set codec parameters&#xA;    codec_ctx->bit_rate = std::stoi(bitrate);&#xA;    codec_ctx->sample_rate = this->get_frame_rate(); // Ensure this returns the correct sample rate&#xA;    av_channel_layout_default(&amp;codec_ctx->ch_layout, 2);&#xA;    codec_ctx->sample_fmt = codec_ptr->sample_fmts ? codec_ptr->sample_fmts[0] : AV_SAMPLE_FMT_FLTP;&#xA;&#xA;    // Open codec&#xA;    if (avcodec_open2(codec_ctx, codec_ptr, nullptr) &lt; 0) {&#xA;        throw std::runtime_error("Could not open codec.");&#xA;    }&#xA;&#xA;    // Set codec parameters to the stream&#xA;    if (avcodec_parameters_from_context(stream->codecpar, codec_ctx) &lt; 0) {&#xA;        throw std::runtime_error("Could not initialize stream codec parameters.");&#xA;    }&#xA;&#xA;    // Open output file&#xA;    std::ofstream out_file(out_f, std::ios::binary);&#xA;    if (!out_file) {&#xA;        throw std::runtime_error("Failed to open output file.");&#xA;    }&#xA;&#xA;    if (!(format_ctx->oformat->flags &amp; AVFMT_NOFILE)) {&#xA;        if (avio_open(&amp;format_ctx->pb, out_f.c_str(), AVIO_FLAG_WRITE) &lt; 0) {&#xA;            throw std::runtime_error("Could not open output file.");&#xA;        }&#xA;    }&#xA;&#xA;    // Write file header&#xA;    if (avformat_write_header(format_ctx, nullptr) &lt; 0) {&#xA;        throw std::runtime_error("Error occurred when opening output file.");&#xA;    }&#xA;&#xA;    // Initialize packet&#xA;    pkt = av_packet_alloc();&#xA;    if (!pkt) {&#xA;        throw std::runtime_error("Could not allocate AVPacket.");&#xA;    }&#xA;&#xA;    // Initialize frame&#xA;    frame = av_frame_alloc();&#xA;    if (!frame) {&#xA;        throw std::runtime_error("Could not allocate AVFrame.");&#xA;    }&#xA;    frame->nb_samples = codec_ctx->frame_size;&#xA;    frame->format = codec_ctx->sample_fmt;&#xA;    frame->ch_layout = codec_ctx->ch_layout;&#xA;&#xA;    // Allocate data buffer&#xA;    if (av_frame_get_buffer(frame, 0) &lt; 0) {&#xA;        throw std::runtime_error("Could not allocate audio data buffers.");&#xA;    }&#xA;&#xA;    // Initialize SwrContext for resampling&#xA;    swr_ctx = swr_alloc();&#xA;    if (!swr_ctx) {&#xA;        throw std::runtime_error("Could not allocate SwrContext.");&#xA;    }&#xA;&#xA;    // Set options for resampling&#xA;    av_opt_set_chlayout(swr_ctx, "in_chlayout", &amp;codec_ctx->ch_layout, 0);&#xA;    av_opt_set_chlayout(swr_ctx, "out_chlayout", &amp;codec_ctx->ch_layout, 0);&#xA;    av_opt_set_int(swr_ctx, "in_sample_rate", codec_ctx->sample_rate, 0);&#xA;    av_opt_set_int(swr_ctx, "out_sample_rate", codec_ctx->sample_rate, 0);&#xA;    av_opt_set_sample_fmt(swr_ctx, "in_sample_fmt", AV_SAMPLE_FMT_S16, 0); // Assuming input is S16&#xA;    av_opt_set_sample_fmt(swr_ctx, "out_sample_fmt", codec_ctx->sample_fmt, 0);&#xA;&#xA;    // Initialize the resampling context&#xA;    if (swr_init(swr_ctx) &lt; 0) {&#xA;        throw std::runtime_error("Failed to initialize SwrContext.");&#xA;    }&#xA;&#xA;    int samples_read = 0;&#xA;    int total_samples = data_.size() / (av_get_bytes_per_sample(AV_SAMPLE_FMT_S16) * 2); // Assuming input is stereo&#xA;&#xA;    while (samples_read &lt; total_samples) {&#xA;        if (av_frame_make_writable(frame) &lt; 0) {&#xA;            throw std::runtime_error("Frame not writable.");&#xA;        }&#xA;&#xA;        int num_samples = std::min(codec_ctx->frame_size, total_samples - samples_read);&#xA;&#xA;        // Prepare input data&#xA;        const uint8_t* input_data[2] = { reinterpret_cast<const>(data_.data() &#x2B; samples_read * av_get_bytes_per_sample(AV_SAMPLE_FMT_S16) * 2), nullptr };&#xA;        int output_samples = swr_convert(swr_ctx, frame->data, frame->nb_samples,&#xA;            input_data, num_samples);&#xA;&#xA;        if (output_samples &lt; 0) {&#xA;            throw std::runtime_error("Error converting audio samples.");&#xA;        }&#xA;&#xA;        frame->nb_samples = output_samples;&#xA;&#xA;        // Send the frame for encoding&#xA;        if (avcodec_send_frame(codec_ctx, frame) &lt; 0) {&#xA;            throw std::runtime_error("Error sending frame for encoding.");&#xA;        }&#xA;&#xA;        // Receive and write packets&#xA;        while (avcodec_receive_packet(codec_ctx, pkt) >= 0) {&#xA;            out_file.write(reinterpret_cast(pkt->data), pkt->size);&#xA;            av_packet_unref(pkt);&#xA;        }&#xA;&#xA;        samples_read &#x2B;= num_samples;&#xA;    }&#xA;&#xA;    // Flush the encoder&#xA;    if (avcodec_send_frame(codec_ctx, nullptr) &lt; 0) {&#xA;        throw std::runtime_error("Error flushing the encoder.");&#xA;    }&#xA;&#xA;    while (avcodec_receive_packet(codec_ctx, pkt) >= 0) {&#xA;        out_file.write(reinterpret_cast(pkt->data), pkt->size);&#xA;        av_packet_unref(pkt);&#xA;    }&#xA;&#xA;    // Write file trailer&#xA;    av_write_trailer(format_ctx);&#xA;&#xA;    // Cleanup&#xA;    av_frame_free(&amp;frame);&#xA;    av_packet_free(&amp;pkt);&#xA;    swr_free(&amp;swr_ctx);&#xA;    avcodec_free_context(&amp;codec_ctx);&#xA;&#xA;    if (!(format_ctx->oformat->flags &amp; AVFMT_NOFILE)) {&#xA;        avio_closep(&amp;format_ctx->pb);&#xA;    }&#xA;    avformat_free_context(format_ctx);&#xA;&#xA;    out_file.close();&#xA;    return out_file;&#xA;}&#xA;&#xA;//declaration&#xA;/*&#xA;std::ofstream export_segment(const std::string&amp; out_f,&#xA;    const std::string&amp; format = "mp3",&#xA;    const std::string&amp; codec = "libmp3lame",&#xA;    const std::string&amp; bitrate = "128000",&#xA;    const std::vector&amp; parameters = {},&#xA;    const std::map&amp; tags = {},&#xA;    const std::string&amp; id3v2_version = "4",&#xA;    const std::string&amp; cover = "");&#xA;*/&#xA;</const></const></char>

    &#xA;

    This code only works for mp3 format. I also want to export to aac,ogg,flv,wav and any other popular formats.

    &#xA;

  • fluent-ffmpeg unable to achieve consistent audio bitrate

    20 septembre 2020, par Martin

    I am working on an electron app that runs ffmpeg commands to combine an audio file + image file into a video file. I'm using an fluent-ffmpeg command I wrote, which works, but the outputted video fiels always have a lower audio bit rate, even though I specify 320kbps in my command.

    &#xA;

    I've been running multiple tests with mp3/flac/wav files that I logged below ; using my fluent-ffmpeg command as follows :

    &#xA;

    ffmpeg()&#xA;            .input(imgPath)&#xA;            .loop()&#xA;            .addInputOption(&#x27;-framerate 2&#x27;)&#xA;            .input(audioPath)&#xA;            .videoCodec(&#x27;libx264&#x27;)&#xA;            .audioCodec(&#x27;aac&#x27;)&#xA;            //.audioCodec(&#x27;copy&#x27;)&#xA;            .audioBitrate(&#x27;320k&#x27;)&#xA;            .videoBitrate(&#x27;8000k&#x27;, true)&#xA;            .size(&#x27;1920x1080&#x27;)&#xA;            .outputOptions([&#xA;                &#x27;-preset medium&#x27;,&#xA;                &#x27;-tune stillimage&#x27;,&#xA;                &#x27;-crf 18&#x27;,&#xA;                &#x27;-b:a 320k&#x27;,&#xA;                &#x27;-pix_fmt yuv420p&#x27;,&#xA;                &#x27;-shortest&#x27;&#xA;            ])&#xA;&#xA;            .on(&#x27;progress&#x27;, function (progress) {&#xA;                document.getElementById(updateInfoLocation).innerText = `Generating Video: ${Math.round(progress.percent)}%`&#xA;                console.info(`vid() Processing : ${progress.percent} % done`);&#xA;            })&#xA;            .on(&#x27;codecData&#x27;, function (data) {&#xA;                console.log(&#x27;vid() codecData=&#x27;, data);&#xA;            })&#xA;            .on(&#x27;end&#x27;, function () {&#xA;                document.getElementById(updateInfoLocation).innerText = `Video generated.`&#xA;                console.log(&#x27;vid()  file has been converted succesfully; resolve() promise&#x27;);&#xA;                resolve();&#xA;            })&#xA;            .on(&#x27;error&#x27;, function (err) {&#xA;                document.getElementById(updateInfoLocation).innerText = `Error generating video.`&#xA;                console.log(&#x27;vid() an error happened: &#x27; &#x2B; err.message, &#x27;, reject()&#x27;);&#xA;                reject(err);&#xA;            })&#xA;            .output(vidOutput).run()&#xA;

    &#xA;

    1) 320bkps mp3 file

    &#xA;

      &#xA;
    • imgPath = C :\Users\marti\Documents\projects\mp3 audio\img.jpg
    • &#xA;

    • audioPath = C :\Users\marti\Documents\projects\mp3 audio\08. Strange - Gigi D'Agostino.mp3
    • &#xA;

    • vidOutput = C :\Users\marti\Documents\projects\mp3 audio\08. Strange - Gigi D'Agostino.mp4
    • &#xA;

    &#xA;

    My audio file has a bit rate of 320kbps :&#xA;enter image description here

    &#xA;

    If I use my above ffmpeg function with .audioCodec(&#x27;aac&#x27;), the output video file has an audio bitrate of 300kbps. If I use .audioCodec(&#x27;copy&#x27;) instead, my output video file has an audio bitrate of 317kbps.

    &#xA;

    2) 1411kbps wav file

    &#xA;

      &#xA;
    • audioPath = C :\Users\marti\Documents\projects\wav audio\01 Nem Kaldi.wav
    • &#xA;

    • imgPath = C :\Users\marti\Documents\projects\wav audio\cover.jpg
    • &#xA;

    • vidOutput = C :\Users\marti\Documents\projects\wav audio\01 Nem Kaldi.mp4
    • &#xA;

    &#xA;

    My wav file has an audio bit rate of 1411kbps ; enter image description here&#xA;When I run my ffmpeg command using .audioCodec(&#x27;aac&#x27;), my output video file is again only 303kbps&#xA;enter image description here

    &#xA;

    3) 937kbps flac file

    &#xA;

      &#xA;
    • audioPath = C :\Users\marti\Documents\projects\flac audio\02 - ryokika.flac
    • &#xA;

    • imgPath = C :\Users\marti\Documents\projects\flac audio\R-1042328-1388635895-5022.jpeg.jpg
    • &#xA;

    • vidOutput = C :\Users\marti\Documents\projects\flac audio\02 - ryokika.mp4
    • &#xA;

    &#xA;

    When I run my ffmpeg command with.audioCodec(&#x27;aac&#x27;), the outputted video file is only 304kbps

    &#xA;

    is there any command or additional flag I need to add to ensure my output video files have a high quality audio bitrate ?

    &#xA;

  • The problems with wavelets

    27 février 2010, par Dark Shikari — DCT, Dirac, Snow, psychovisual optimizations, wavelets

    I have periodically noted in this blog and elsewhere various problems with wavelet compression, but many readers have requested that I write a more detailed post about it, so here it is.

    Wavelets have been researched for quite some time as a replacement for the standard discrete cosine transform used in most modern video compression. Their methodology is basically opposite : each coefficient in a DCT represents a constant pattern applied to the whole block, while each coefficient in a wavelet transform represents a single, localized pattern applied to a section of the block. Accordingly, wavelet transforms are usually very large with the intention of taking advantage of large-scale redundancy in an image. DCTs are usually quite small and are intended to cover areas of roughly uniform patterns and complexity.

    Both are complete transforms, offering equally accurate frequency-domain representations of pixel data. I won’t go into the mathematical details of each here ; the real question is whether one offers better compression opportunities for real-world video.

    DCT transforms, though it isn’t mathematically required, are usually found as block transforms, handling a single sharp-edged block of data. Accordingly, they usually need a deblocking filter to smooth the edges between DCT blocks. Wavelet transforms typically overlap, avoiding such a need. But because wavelets don’t cover a sharp-edged block of data, they don’t compress well when the predicted data is in the form of blocks.

    Thus motion compensation is usually performed as overlapped-block motion compensation (OBMC), in which every pixel is calculated by performing the motion compensation of a number of blocks and averaging the result based on the distance of those blocks from the current pixel. Another option, which can be combined with OBMC, is “mesh MC“, where every pixel gets its own motion vector, which is a weighted average of the closest nearby motion vectors. The end result of either is the elimination of sharp edges between blocks and better prediction, at the cost of greatly increased CPU requirements. For an overlap factor of 2, it’s 4 times the amount of motion compensation, plus the averaging step. With mesh MC, it’s even worse, with SIMD optimizations becoming nearly impossible.

    At this point, it would seem wavelets would have pretty big advantages : when used with OBMC, they have better inter prediction, eliminate the need for deblocking, and take advantage of larger-scale correlations. Why then hasn’t everyone switched over to wavelets then ? Dirac and Snow offer modern implementations. Yet despite decades of research, wavelets have consistently disappointed for image and video compression. It turns out there are a lot of serious practical issues with wavelets, many of which are open problems.

    1. No known method exists for efficient intra coding. H.264′s spatial intra prediction is extraordinarily powerful, but relies on knowing the exact decoded pixels to the top and left of the current block. Since there is no such boundary in overlapped-wavelet coding, such prediction is impossible. Newer intra prediction methods, such as markov-chain intra prediction, also seem to require an H.264-like situation with exactly-known neighboring pixels. Intra coding in wavelets is in the same state that DCT intra coding was in 20 years ago : the best known method was to simply transform the block with no prediction at all besides DC. NB : as described by Pengvado in the comments, the switching between inter and intra coding is potentially even more costly than the inefficient intra coding.

    2. Mixing partition sizes has serious practical problems. Because the overlap between two motion partitions depends on the partitions’ size, mixing block sizes becomes quite difficult to define. While in H.264 an smaller partition always gives equal or better compression than a larger one when one ignores the extra overhead, it is actually possible for a larger partition to win when using OBMC due to the larger overlap. All of this makes both the problem of defining the result of mixed block sizes and making decisions about them very difficult.

    Both Snow and Dirac offer variable block size, but the overlap amount is constant ; larger blocks serve only to save bits on motion vectors, not offer better overlap characteristics.

    3. Lack of spatial adaptive quantization. As shown in x264 with VAQ, and correspondingly in HCEnc’s implementation and Theora’s recent implementation, spatial adaptive quantization has staggeringly impressive (before, after) effects on visual quality. Only Dirac seems to have such a feature, and the encoder doesn’t even use it. No other wavelet formats (Snow, JPEG2K, etc) seem to have such a feature. This results in serious blurring problems in areas with subtle texture (as in the comparison below).

    4. Wavelets don’t seem to code visual energy effectively. Remember that a single coefficient in a DCT represents a pattern which applies across an entire block : this makes it very easy to create apparent “detail” with a DCT. Furthermore, the sharp edges of DCT blocks, despite being an apparent weakness, often result in a “fake sharpness” that can actually improve the visual appearance of videos, as was seen with Xvid. Thus wavelet codecs have a tendency to look much blurrier than DCT-based codecs, but since PSNR likes blur, this is often seen as a benefit during video compression research. Some of the consequences of these factors can be seen in this comparison ; somewhat outdated and not general-case, but which very effectively shows the difference in how wavelets handle sharp edges and subtle textures.

    Another problem that periodically crops up is the visual aliasing that tends to be associated with wavelets at lower bitrates. Standard wavelets effectively consist of a recursive function that upscales the coefficients coded by the previous level by a factor of 2 and then adds a new set of coefficients. If the upscaling algorithm is naive — as it often is, for the sake of speed — the result can look quite ugly, as if parts of the image were coded at a lower resolution and then badly scaled up. Of course, it looks like that because they were coded at a lower resolution and then badly scaled up.

    JPEG2000 is a classic example of wavelet failure : despite having more advanced entropy coding, being designed much later than JPEG, being much more computationally intensive, and having much better PSNR, comparisons have consistently shown it to be visually worse than JPEG at sane filesizes. Here’s an example from Wikipedia. By comparison, H.264′s intra coding, when used for still image compression, can beat JPEG by a factor of 2 or more (I’ll make a post on this later). With the various advancements in DCT intra coding since H.264, I suspect that a state-of-the-art DCT compressor could win by an even larger factor.

    Despite the promised benefits of wavelets, a wavelet encoder even close to competitive with x264 has yet to be created. With some tests even showing Dirac losing to Theora in visual comparisons, it’s clear that many problems remain to be solved before wavelets can eliminate the ugliness of block-based transforms once and for all.