Recherche avancée

Médias (0)

Mot : - Tags -/metadatas

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (22)

  • Supporting all media types

    13 avril 2011, par

    Unlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)

  • Les formats acceptés

    28 janvier 2010, par

    Les commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
    ffmpeg -codecs ffmpeg -formats
    Les format videos acceptés en entrée
    Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
    Les formats vidéos de sortie possibles
    Dans un premier temps on (...)

  • Les vidéos

    21 avril 2011, par

    Comme les documents de type "audio", Mediaspip affiche dans la mesure du possible les vidéos grâce à la balise html5 .
    Un des inconvénients de cette balise est qu’elle n’est pas reconnue correctement par certains navigateurs (Internet Explorer pour ne pas le nommer) et que chaque navigateur ne gère en natif que certains formats de vidéos.
    Son avantage principal quant à lui est de bénéficier de la prise en charge native de vidéos dans les navigateur et donc de se passer de l’utilisation de Flash et (...)

Sur d’autres sites (2523)

  • Accented characters are not recognized in python [closed]

    10 avril 2023, par CorAnna

    I have a problem in the python script, my script should put subtitles in a video given a srt file, this srt file is written by another script but in its script it replaces the accents and all the particular characters with a black square symbol with a question mark inside it... the problem I think lies in the writing of this file, what follows and that in overwriting the subtitles I do with ffmpeg the sentences that contain an accented word are not written

    


    def video_audio_file_writer(video_file):

    videos_folder = "Video"
    audios_folder = "Audio"

    video_path = f"{videos_folder}\\{video_file}"

    video_name = Path(video_path).stem
    audio_name = f"{video_name}Audio"

    audio_path = f"{audios_folder}\\{audio_name}.wav"

    video = mp.VideoFileClip(video_path)
    audio = video.audio.write_audiofile(audio_path)

    return video_path, audio_path, video_name

    def audio_file_transcription(audio_path, lang):

    model = whisper.load_model("base")
    tran = gt.Translator()

    audio_file = str(audio_path)

    options = dict(beam_size=5, best_of=5)
    translate = dict(task="translate", **options)
    result = model.transcribe(audio_file, **translate)   

    return result

def audio_subtitles_transcription(result, video_name):

    subtitle_folder = "Content"
    subtitle_name = f"{video_name}Subtitle"
    subtitle_path_form = "srt"

    subtitle_path = f"{subtitle_folder}\\{subtitle_name}.{subtitle_path_form}"

    with open(os.path.join(subtitle_path), "w") as srt:
        # write_vtt(result["segments"], file=vtt)
        write_srt(result["segments"], file=srt)
            
    return subtitle_path

def video_subtitles(video_path, subtitle_path, video_name):

    video_subtitled_folder = "VideoSubtitles"
    video_subtitled_name = f"{video_name}Subtitles"
    video_subtitled_path = f"{video_subtitled_folder}\\{video_subtitled_name}.mp4"

    video_path_b = bytes(video_path, 'utf-8')
    subtitle_path_b = bytes(subtitle_path, 'utf-8')
    video_subtitled_path_b = bytes(video_subtitled_path, 'utf-8')

    path_abs_b = os.getcwdb() + b"\\"

    path_abs_bd = path_abs_b.decode('utf-8')
    video_path_bd= video_path_b.decode('utf-8')
    subtitle_path_bd = subtitle_path_b.decode('utf-8')
    video_subtitled_path_bd = video_subtitled_path_b.decode('utf-8')

    video_path_abs = str(path_abs_bd + video_path_bd)
    subtitle_path_abs = str(path_abs_bd + subtitle_path_bd).replace("\\", "\\\\").replace(":", "\\:")
    video_subtitled_path_abs = str(path_abs_bd + video_subtitled_path_bd)

    time.sleep(3)

    os.system(f"ffmpeg -i {video_path_abs} -vf subtitles='{subtitle_path_abs}' -y {video_subtitled_path_abs}")

    return video_subtitled_path_abs, video_path_abs, subtitle_path_abs

if __name__ == "__main__":

    video_path, audio_path, video_name = video_audio_file_writer(video_file="ChiIng.mp4")
    result = audio_file_transcription(audio_path=audio_path, lang="it")
    subtitle_path = audio_subtitles_transcription(result=result, video_name=video_name)
    video_subtitled_path_abs, video_path_abs, subtitle_path_abs = video_subtitles(video_path=video_path, subtitle_path=subtitle_path, video_name=video_name)
    
    print("Video Subtitled")


    


    Windows 11
Python 3.10

    


  • Converting uint8_t data to AVFrame with FFmpeg

    30 octobre 2017, par J.Lefebvre

    I am currently working in C++ with the Autodesk 3DStudio Max 2014 SDK (toolset 100) and the Ffmpeg library in Visual Studio 2015 and trying to convert a DIB (Device Independent Bitmap) to uint8_t pointer array and then convert these data to an AVFrame.

    I don’t have any errors, but my video is still black and without meta data.
    (no time display, etc)

    I made approximatively the same with a Visual Studio Console application to convert jpeg image sequence from disk and this is working fine.
    (The only difference is that instead of converting jpeg to AVFrame with the Ffmpeg library, I try to convert raw data to an AVFrame.)

    So I think the problem could be either on the DIB conversion to the uint8_t data or the uint8_t data to the AVFrame.
    (The second is more plausible, because I used the SFML library to display a window with my rgb uint8_t* data for debuging and it is working fine.)

    I first initialize the ffmpeg library :

    This function is called once at the beginning.

    int Converter::Initialize(AVCodecID codec_id, int width, int height, int fps, const char *filename)
    {
       avcodec_register_all();
       av_register_all();

       AVCodec *codec;
       inputFrame = NULL;
       codecContext = NULL;
       pkt = NULL;
       file = NULL;
       outputFilename = new char[strlen(filename)]();
       *outputFilename = '\0';
       strcpy(outputFilename, filename);

       int ret;

       //Initializing AVCodecContext and getting PixelFormat supported by encoder
       codec = avcodec_find_encoder(codec_id);
       if (!codec)
           return 1;

       AVPixelFormat pixFormat = codec->pix_fmts[0];
       codecContext = avcodec_alloc_context3(codec);
       if (!codecContext)
           return 1;

       codecContext->bit_rate = 400000;
       codecContext->width = width;
       codecContext->height = height;
       codecContext->time_base.num = 1;
       codecContext->time_base.den = fps;
       codecContext->gop_size = 10;
       codecContext->max_b_frames = 1;
       codecContext->pix_fmt = pixFormat;

       if (codec_id == AV_CODEC_ID_H264)
           av_opt_set(codecContext->priv_data, "preset", "slow", 0);

       //Actually opening the encoder
       if (avcodec_open2(codecContext, codec, NULL) < 0)
           return 1;

       file = fopen(outputFilename, "wb");
       if (!file)
           return 1;

       inputFrame = av_frame_alloc();
       inputFrame->format = codecContext->pix_fmt;
       inputFrame->width = codecContext->width;
       inputFrame->height = codecContext->height;

       ret = av_image_alloc(inputFrame->data, inputFrame->linesize, codecContext->width, codecContext->height, codecContext->pix_fmt, 32);

       if (ret < 0)
           return 1;

       return 0;
    }

    Then for each frame, I get the DIB and convert to a uint8_t* it with this function :

    uint8_t* Util::ToUint8_t(RGBQUAD *data, int width, int height)
    {
       uint8_t* buf = (uint8_t*)data;

       int imageSize = width * height;
       size_t rgbquad_size = sizeof(RGBQUAD);
       size_t total_bytes = imageSize * rgbquad_size;
       uint8_t * pCopyBuffer = new uint8_t[total_bytes];

       for (int x = 0; x < width; x++)
       {
           for (int y = 0; y < height; y++)
           {
               int index = (x + width * y) * rgbquad_size;
               int invertIndex = (x + width* (height - y - 1)) * rgbquad_size;

               //BGRA to RGBA
               pCopyBuffer[index] = buf[invertIndex + 2];
               pCopyBuffer[index + 1] = buf[invertIndex + 1];
               pCopyBuffer[index + 2] = buf[invertIndex];
               pCopyBuffer[index + 3] = 0xFF;
           }
       }

       return pCopyBuffer;
    }

    void GetDIBBuffer(Interface* ip, BITMAPINFO *bmi, uint8_t** outBuffer)
    {
       int size;

       ViewExp& view = ip->GetActiveViewExp();

       view.getGW()->getDIB(NULL, &size);

       bmi = (BITMAPINFO *)malloc(size);
       BITMAPINFOHEADER *bmih = (BITMAPINFOHEADER *)bmi;
       view.getGW()->getDIB(bmi, &size);

       uint8_t * pCopyBuffer = Util::ToUint8_t(bmi->bmiColors, bmih->biWidth, bmih->biHeight);

       *outBuffer = pCopyBuffer;
    }

    This function is used to get the DIB :

    void GetViewportDIB(Interface* ip, BITMAPINFO *bmi, BITMAPINFOHEADER *bmih, BitmapInfo biFile, Bitmap *map)
    {
       int size;

       if (!biFile.Name()[0])
           return;

       ViewExp& view = ip->GetActiveViewExp();

       view.getGW()->getDIB(NULL, &size);

       bmi = (BITMAPINFO *)malloc(size);
       bmih = (BITMAPINFOHEADER *)bmi;

       view.getGW()->getDIB(bmi, &size);

       biFile.SetWidth((WORD)bmih->biWidth);
       biFile.SetHeight((WORD)bmih->biHeight);
       biFile.SetType(BMM_TRUE_32);

       map = TheManager->Create(&biFile);
       map->OpenOutput(&biFile);
       map->FromDib(bmi);
       map->Write(&biFile);
       map->Close(&biFile);
    }

    And after the conversion to AVFrame and video encoding :

    The EncodeFromMem function is call each frame.

    int Converter::EncodeFromMem(const char *outputDir, int frameNumber, uint8_t* data)
    {
       int ret;

       inputFrame->pts = frameNumber;
       EncodeFrame(data, codecContext, inputFrame, &pkt, file);

       return 0;
    }

    static void RgbToYuv(uint8_t *rgb, AVCodecContext *c, AVFrame *frame)
    {
       struct SwsContext *swsCtx = NULL;
       const int in_linesize[1] = { 3 * c->width };// RGB stride
       swsCtx = sws_getCachedContext(swsCtx, c->width, c->height, AV_PIX_FMT_RGB24, c->width, c->height, AV_PIX_FMT_YUV420P, 0, 0, 0, 0);
       sws_scale(swsCtx, (const uint8_t * const *)&rgb, in_linesize, 0, c->height, frame->data, frame->linesize);
    }

    static void EncodeFrame(uint8_t *rgb, AVCodecContext *c, AVFrame *frame, AVPacket **pkt, FILE *file)
    {
       int ret, got_output;

       RgbToYuv(rgb, c, frame);

       *pkt = av_packet_alloc();
       av_init_packet(*pkt);
       (*pkt)->data = NULL;
       (*pkt)->size = 0;

       ret = avcodec_encode_video2(c, *pkt, frame, &got_output);
       if (ret < 0)
       {
           fprintf(stderr, "Error encoding frame/n");
           exit(1);
       }
       if (got_output)
       {
           fwrite((*pkt)->data, 1, (*pkt)->size, file);
           av_packet_unref(*pkt);
       }
    }

    To finish I have a function that write the packets and free the memory :
    This function is called once at the end of the time range.

    int Converter::Finalize()
    {
       int ret, got_output;
       uint8_t endcode[] = { 0, 0, 1, 0xb7 };

       /* get the delayed frames */
       do
       {
           fflush(stdout);
           ret = avcodec_encode_video2(codecContext, pkt, NULL, &got_output);
           if (ret < 0)
           {
               fprintf(stderr, "Error encoding frame/n");
               return 1;
           }
           if (got_output)
           {
               fwrite(pkt->data, 1, pkt->size, file);
               av_packet_unref(pkt);
           }
       } while (got_output);

       fwrite(endcode, 1, sizeof(endcode), file);
       fclose(file);

       avcodec_close(codecContext);
       av_free(codecContext);

       av_frame_unref(inputFrame);
       av_frame_free(&inputFrame);
       //av_freep(&inputFrame->data[0]); //Crash

       delete outputFilename;
       outputFilename = 0;

       return 0;
    }

    EDIT :

    I modify my RgbToYuv function and create another one to convert back the yuv frame to an rgb one.

    This not really solve the problem, but maybe focus the problem on the conversion from YuvToRgb.

    This is the result of the conversion from YUV to RGB :

     ![YuvToRgb result] : https://img42.com/kHqpt+

    static void YuvToRgb(AVCodecContext *c, AVFrame *frame)
    {
       struct SwsContext *img_convert_ctx = sws_getContext(c->width, c->height, AV_PIX_FMT_YUV420P, c->width, c->height, AV_PIX_FMT_RGB24, SWS_BICUBIC, NULL, NULL, NULL);
       AVFrame * rgbPictInfo = av_frame_alloc();
       avpicture_fill((AVPicture*)rgbPictInfo, *(frame)->data, AV_PIX_FMT_RGB24, c->width, c->height);
       sws_scale(img_convert_ctx, frame->data, frame->linesize, 0, c->height, rgbPictInfo->data, rgbPictInfo->linesize);

       Util::DebugWindow(c->width, c->height, rgbPictInfo->data[0]);
    }
    static void RgbToYuv(uint8_t *rgb, AVCodecContext *c, AVFrame *frame)
    {
       AVFrame * rgbPictInfo = av_frame_alloc();
       avpicture_fill((AVPicture*)rgbPictInfo, rgb, AV_PIX_FMT_RGBA, c->width, c->height);

       struct SwsContext *swsCtx = sws_getContext(c->width, c->height, AV_PIX_FMT_RGBA, c->width, c->height, AV_PIX_FMT_YUV420P, SWS_BICUBIC, NULL, NULL, NULL);
       avpicture_fill((AVPicture*)frame, rgb, AV_PIX_FMT_YUV420P, c->width, c->height);    
       sws_scale(swsCtx, rgbPictInfo->data, rgbPictInfo->linesize, 0, c->height, frame->data, frame->linesize);

       YuvToRgb(c, frame);
    }
  • Greed is Good ; Greed Works

    25 novembre 2010, par Multimedia Mike — VP8

    Greed, for lack of a better word, is good ; Greed works. Well, most of the time. Maybe.

    Picking Prediction Modes
    VP8 uses one of 4 prediction modes to predict a 16x16 luma block or 8x8 chroma block before processing it (for luma, a block can also be broken into 16 4x4 blocks for individual prediction using even more modes).

    So, how to pick the best predictor mode ? I had no idea when I started writing my VP8 encoder. I did not read any literature on the matter ; I just sat down and thought of a brute-force approach. According to the comments in my code :

    // naive, greedy algorithm :
    //   residual = source - predictor
    //   mean = mean(residual)
    //   residual -= mean
    //   find the max diff between the mean and the residual
    // the thinking is that, post-prediction, the best block will
    // be comprised of similar samples
    

    After removing the predictor from the macroblock, individual 4x4 subblocks are put through a forward DCT and quantized. Optimal compression in this scenario results when all samples are the same since only the DC coefficient will be non-zero. Failing that, when the input samples are at least similar to each other, few of the AC coefficients will be non-zero, which helps compression. When the samples are all over the scale, there aren’t a whole lot of non-zero coefficients unless you crank up the quantizer, which results in poor quality in the reconstructed subblocks.

    Thus, my goal was to pick a prediction mode that, when applied to the input block, resulted in a residual in which each element would feature the least deviation from the mean of the residual (relative to other prediction choices).

    Greedy Approach
    I realized that this algorithm falls into the broad general category of "greedy" algorithms— one that makes locally optimal decisions at each stage. There are most likely smarter algorithms. But this one was good enough for making an encoder that just barely works.

    Compression Results
    I checked the total file compression size on my usual 640x360 Big Buck Bunny logo image while forcing prediction modes vs. using my greedy prediction picking algorithm. In this very simple test, DC-only actually resulted in slightly better compression than the greedy algorithm (which says nothing about overall quality).

    prediction mode quantizer index = 0 (minimum) quantizer index = 10
    greedy 286260 98028
    DC 280593 95378
    vertical 297206 105316
    horizontal 295357 104185
    TrueMotion 311660 113480

    As another data point, in both quantizer cases, my greedy algorithm selected a healthy mix of prediction modes :

    • quantizer index 0 : DC = 521, VERT = 151, HORIZ = 183, TM = 65
    • quantizer index 10 : DC = 486, VERT = 167, HORIZ = 190, TM = 77

    Size vs. Quality
    Again, note that this ad-hoc test only measures one property (a highly objective one)— compression size. It did not account for quality which is a far more controversial topic that I have yet to wade into.