Recherche avancée

Médias (0)

Mot : - Tags -/clipboard

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (64)

  • Support de tous types de médias

    10 avril 2011

    Contrairement à beaucoup de logiciels et autres plate-formes modernes de partage de documents, MediaSPIP a l’ambition de gérer un maximum de formats de documents différents qu’ils soient de type : images (png, gif, jpg, bmp et autres...) ; audio (MP3, Ogg, Wav et autres...) ; vidéo (Avi, MP4, Ogv, mpg, mov, wmv et autres...) ; contenu textuel, code ou autres (open office, microsoft office (tableur, présentation), web (html, css), LaTeX, Google Earth) (...)

  • Publier sur MédiaSpip

    13 juin 2013

    Puis-je poster des contenus à partir d’une tablette Ipad ?
    Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir

  • Demande de création d’un canal

    12 mars 2010, par

    En fonction de la configuration de la plateforme, l’utilisateur peu avoir à sa disposition deux méthodes différentes de demande de création de canal. La première est au moment de son inscription, la seconde, après son inscription en remplissant un formulaire de demande.
    Les deux manières demandent les mêmes choses fonctionnent à peu près de la même manière, le futur utilisateur doit remplir une série de champ de formulaire permettant tout d’abord aux administrateurs d’avoir des informations quant à (...)

Sur d’autres sites (7148)

  • Evolution #2050 : site vide au début

    10 mai 2011, par cedric -

    voir aussi #2056

  • Google Speech API + Go - Transcribing Audio Stream of Unknown Length

    14 février 2018, par Josh

    I have an rtmp stream of a video call and I want to transcribe it. I have created 2 services in Go and I’m getting results but it’s not very accurate and a lot of data seems to get lost.

    Let me explain.

    I have a transcode service, I use ffmpeg to transcode the video to Linear16 audio and place the output bytes onto a PubSub queue for a transcribe service to handle. Obviously there is a limit to the size of the PubSub message, and I want to start transcribing before the end of the video call. So, I chunk the transcoded data into 3 second clips (not fixed length, just seems about right) and put them onto the queue.

    The data is transcoded quite simply :

    var stdout Buffer

    cmd := exec.Command("ffmpeg", "-i", url, "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-")
    cmd.Stdout = &stdout

    if err := cmd.Start(); err != nil {
       log.Fatal(err)
    }

    ticker := time.NewTicker(3 * time.Second)

    for {
       select {
       case <-ticker.C:
           bytesConverted := stdout.Len()
           log.Infof("Converted %d bytes", bytesConverted)

           // Send the data we converted, even if there are no bytes.
           topic.Publish(ctx, &pubsub.Message{
               Data: stdout.Bytes(),
           })

           stdout.Reset()
       }
    }

    The transcribe service pulls messages from the queue at a rate of 1 every 3 seconds, helping to process the audio data at about the same rate as it’s being created. There are limits on the Speech API stream, it can’t be longer than 60 seconds so I stop the old stream and start a new one every 30 seconds so we never hit the limit, no matter how long the video call lasts for.

    This is how I’m transcribing it :

    stream := prepareNewStream()
    clipLengthTicker := time.NewTicker(30 * time.Second)
    chunkLengthTicker := time.NewTicker(3 * time.Second)

    cctx, cancel := context.WithCancel(context.TODO())
    err := subscription.Receive(cctx, func(ctx context.Context, msg *pubsub.Message) {

       select {
       case <-clipLengthTicker.C:
           log.Infof("Clip length reached.")
           log.Infof("Closing stream and starting over")

           err := stream.CloseSend()
           if err != nil {
               log.Fatalf("Could not close stream: %v", err)
           }

           go getResult(stream)
           stream = prepareNewStream()

       case <-chunkLengthTicker.C:
           log.Infof("Chunk length reached.")

           bytesConverted := len(msg.Data)

           log.Infof("Received %d bytes\n", bytesConverted)

           if bytesConverted > 0 {
               if err := stream.Send(&speechpb.StreamingRecognizeRequest{
                   StreamingRequest: &speechpb.StreamingRecognizeRequest_AudioContent{
                       AudioContent: transcodedChunk.Data,
                   },
               }); err != nil {
                   resp, _ := stream.Recv()
                   log.Errorf("Could not send audio: %v", resp.GetError())
               }
           }

           msg.Ack()
       }
    })

    I think the problem is that my 3 second chunks don’t necessarily line up with starts and end of phrases or sentences so I suspect that the Speech API is a recurrent neural network which has been trained on full sentences rather than individual words. So starting a clip in the middle of a sentence loses some data because it can’t figure out the first few words up to the natural end of a phrase. Also, I lose some data in changing from an old stream to a new stream. There’s some context lost. I guess overlapping clips might help with this.

    I have a couple of questions :

    1) Does this architecture seem appropriate for my constraints (unknown length of audio stream, etc.) ?

    2) What can I do to improve accuracy and minimise lost data ?

    (Note I’ve simplified the examples for readability. Point out if anything doesn’t make sense because I’ve been heavy handed in cutting the examples down.)

  • avcodec/codec_internal : Remove FF_CODEC_CAP_ALLOCATE_PROGRESS

    18 septembre 2023, par Andreas Rheinhardt
    avcodec/codec_internal : Remove FF_CODEC_CAP_ALLOCATE_PROGRESS
    

    Before commit f025b8e110b36c1cdb4fb56c4cd57aeca1767b5b,
    every frame-threaded decoder used ThreadFrames, even when
    they did not have any inter-frame dependencies at all.
    In order to distinguish those decoders that need the AVBuffer
    for progress communication from those that do not (to avoid
    the allocation for the latter), the former decoders were marked
    with the FF_CODEC_CAP_ALLOCATE_PROGRESS internal codec cap.

    Yet distinguishing these two can be done in a more natural way :
    Don't use ThreadFrames when not needed and split ff_thread_get_buffer()
    into a core function that calls the user's get_buffer2 callback
    and a wrapper around it that also allocates the progress AVBuffer.
    This has been done in 02220b88fc38ef9dd4f2d519f5d3e4151258b60c
    and since that commit the ALLOCATE_PROGRESS cap was nearly redundant.

    The only exception was WebP and VP8. WebP can contain VP8
    and uses the VP8 decoder directly (i.e. they share the same
    AVCodecContext). Both decoders are frame-threaded and VP8
    has inter-frame dependencies (in general, not in valid WebP)
    and therefore the ALLOCATE_PROGRESS cap. In order to avoid
    allocating progress in case of a frame-threaded WebP decoder
    the cap and the check for the cap has been kept in place.

    Yet now the VP8 decoder has been switched to use ProgressFrames
    and therefore there is just no reason any more for this check
    and the cap. This commit therefore removes both.

    Also change the value of FF_CODEC_CAP_USES_PROGRESSFRAMES
    to leave no gaps.

    Signed-off-by : Andreas Rheinhardt <andreas.rheinhardt@outlook.com>

    • [DH] doc/multithreading.txt
    • [DH] libavcodec/codec_internal.h
    • [DH] libavcodec/ffv1dec.c
    • [DH] libavcodec/h264dec.c
    • [DH] libavcodec/hevcdec.c
    • [DH] libavcodec/mpeg4videodec.c
    • [DH] libavcodec/pngdec.c
    • [DH] libavcodec/pthread_frame.c
    • [DH] libavcodec/rv30.c
    • [DH] libavcodec/rv40.c
    • [DH] libavcodec/tests/avcodec.c