Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (91)

Richard Stallman et le logiciel libre

19 octobre 2011, par kent1

Mis à jour : Mai 2013

Langue : français

Type : Texte

Tags : opensource, stallman, biographie, livre, framasoft

1
2
3
4
5
Stereo master soundtrack

17 octobre 2011, par kent1

Mis à jour : Octobre 2011

Langue : English

Type : Audio

Tags : creative commons, audio, Elephant dreams, soundtrack, flac

1
2
3
4
5
Elephants Dream - Cover of the soundtrack

17 octobre 2011, par kent1

Mis à jour : Octobre 2011

Langue : English

Type : Image

Tags : image, Elephant dreams, soundtrack

1
2
3
4
5
#7 Ambience

16 octobre 2011, par kent1

Mis à jour : Juin 2015

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5
#6 Teaser Music

16 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5
#5 End Title

16 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Audio

Tags : creative commons, Musique, mp3, Elephant dreams, soundtrack

1
2
3
4
5

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 16

Autres articles (61)

Websites made with MediaSPIP

2 mai 2011, par kent1

This page lists some websites based on MediaSPIP.
Possibilité de déploiement en ferme

12 avril 2011, par kent1

MediaSPIP peut être installé comme une ferme, avec un seul "noyau" hébergé sur un serveur dédié et utilisé par une multitude de sites différents.
Cela permet, par exemple : de pouvoir partager les frais de mise en œuvre entre plusieurs projets / individus ; de pouvoir déployer rapidement une multitude de sites uniques ; d’éviter d’avoir à mettre l’ensemble des créations dans un fourre-tout numérique comme c’est le cas pour les grandes plate-formes tout public disséminées sur le (...)
Ajouter des informations spécifiques aux utilisateurs et autres modifications de comportement liées aux auteurs

12 avril 2011, par kent1

La manière la plus simple d’ajouter des informations aux auteurs est d’installer le plugin Inscription3. Il permet également de modifier certains comportements liés aux utilisateurs (référez-vous à sa documentation pour plus d’informations).
Il est également possible d’ajouter des champs aux auteurs en installant les plugins champs extras 2 et Interface pour champs extras.

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 21

Sur d’autres sites (11639)

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

14 février 2018, par Josh

I have an rtmp stream of a video call and I want to transcribe it. I have created 2 services in Go and I’m getting results but it’s not very accurate and a lot of data seems to get lost.

Let me explain.

I have a transcode service, I use ffmpeg to transcode the video to Linear16 audio and place the output bytes onto a PubSub queue for a transcribe service to handle. Obviously there is a limit to the size of the PubSub message, and I want to start transcribing before the end of the video call. So, I chunk the transcoded data into 3 second clips (not fixed length, just seems about right) and put them onto the queue.

The data is transcoded quite simply :

var stdout Buffer



cmd := exec.Command("ffmpeg", "-i", url, "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-")

cmd.Stdout = &amp;stdout



if err := cmd.Start(); err != nil {

    log.Fatal(err)

}



ticker := time.NewTicker(3 * time.Second)



for {

    select {

    case &lt;-ticker.C:

        bytesConverted := stdout.Len()

        log.Infof("Converted %d bytes", bytesConverted)



        // Send the data we converted, even if there are no bytes.

        topic.Publish(ctx, &amp;pubsub.Message{

            Data: stdout.Bytes(),

        })



        stdout.Reset()

    }

}

The transcribe service pulls messages from the queue at a rate of 1 every 3 seconds, helping to process the audio data at about the same rate as it’s being created. There are limits on the Speech API stream, it can’t be longer than 60 seconds so I stop the old stream and start a new one every 30 seconds so we never hit the limit, no matter how long the video call lasts for.

This is how I’m transcribing it :

stream := prepareNewStream()

clipLengthTicker := time.NewTicker(30 * time.Second)

chunkLengthTicker := time.NewTicker(3 * time.Second)



cctx, cancel := context.WithCancel(context.TODO())

err := subscription.Receive(cctx, func(ctx context.Context, msg *pubsub.Message) {



    select {

    case &lt;-clipLengthTicker.C:

        log.Infof("Clip length reached.")

        log.Infof("Closing stream and starting over")



        err := stream.CloseSend()

        if err != nil {

            log.Fatalf("Could not close stream: %v", err)

        }



        go getResult(stream)

        stream = prepareNewStream()



    case &lt;-chunkLengthTicker.C:

        log.Infof("Chunk length reached.")



        bytesConverted := len(msg.Data)



        log.Infof("Received %d bytes\n", bytesConverted)



        if bytesConverted > 0 {

            if err := stream.Send(&amp;speechpb.StreamingRecognizeRequest{

                StreamingRequest: &amp;speechpb.StreamingRecognizeRequest_AudioContent{

                    AudioContent: transcodedChunk.Data,

                },

            }); err != nil {

                resp, _ := stream.Recv()

                log.Errorf("Could not send audio: %v", resp.GetError())

            }

        }



        msg.Ack()

    }

})

I think the problem is that my 3 second chunks don’t necessarily line up with starts and end of phrases or sentences so I suspect that the Speech API is a recurrent neural network which has been trained on full sentences rather than individual words. So starting a clip in the middle of a sentence loses some data because it can’t figure out the first few words up to the natural end of a phrase. Also, I lose some data in changing from an old stream to a new stream. There’s some context lost. I guess overlapping clips might help with this.

I have a couple of questions :

1) Does this architecture seem appropriate for my constraints (unknown length of audio stream, etc.) ?

2) What can I do to improve accuracy and minimise lost data ?

(Note I’ve simplified the examples for readability. Point out if anything doesn’t make sense because I’ve been heavy handed in cutting the examples down.)

avcodec/tiff : move bpp check to after "end :"

8 mars 2015, par Michael Niedermayer

avcodec/tiff : move bpp check to after "end :"

This ensures that all current and future code-pathes get bpp checked

Signed-off-by : Michael Niedermayer <michaelni@gmx.at>

[D H] libavcodec/tiff.c

Revert "avformat/mov : Bypass av_add_index_entry()"

2 mars 2015, par Michael Niedermayer

Revert "avformat/mov : Bypass av_add_index_entry()"

Next commit will revert the PTS seeking so this is not needed anymore

This reverts commit 38e641a060e0c00930851a8053ca96250b3ecccc.

Signed-off-by : Michael Niedermayer <michaelni@gmx.at>

[D H] libavformat/mov.c

1 | ... | 789 | 790 | 791 | 792 | 793 | 794 | 795 | 796 | 797 | ... | 3880

Recherche avancée

Médias (91)

Richard Stallman et le logiciel libre

Stereo master soundtrack

Elephants Dream - Cover of the soundtrack

#7 Ambience

#6 Teaser Music

#5 End Title

Autres articles (61)

Websites made with MediaSPIP

Possibilité de déploiement en ferme

Ajouter des informations spécifiques aux utilisateurs et autres modifications de comportement liées aux auteurs

Sur d’autres sites (11639)

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

avcodec/tiff : move bpp check to after "end :"

Revert "avformat/mov : Bypass av_add_index_entry()"

Se connecter

Navigation

Syndication

Boussole SPIP