Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (0)

Mot : - Tags -/médias

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (4)

Supporting all media types

13 avril 2011, par kent1

Unlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)
Keeping control of your media in your hands

13 avril 2011, par kent1

The vocabulary used on this site and around MediaSPIP in general, aims to avoid reference to Web 2.0 and the companies that profit from media-sharing.
While using MediaSPIP, you are invited to avoid using words like "Brand", "Cloud" and "Market".
MediaSPIP is designed to facilitate the sharing of creative media online, while allowing authors to retain complete control of their work.
MediaSPIP aims to be accessible to as many people as possible and development is based on expanding the (...)
Le plugin : Podcasts.

14 juillet 2010, par kent1

Le problème du podcasting est à nouveau un problème révélateur de la normalisation des transports de données sur Internet.
Deux formats intéressants existent : Celui développé par Apple, très axé sur l’utilisation d’iTunes dont la SPEC est ici ; Le format "Media RSS Module" qui est plus "libre" notamment soutenu par Yahoo et le logiciel Miro ;
Types de fichiers supportés dans les flux
Le format d’Apple n’autorise que les formats suivants dans ses flux : .mp3 audio/mpeg .m4a audio/x-m4a .mp4 (...)

1 | 2

Sur d’autres sites (3524)

Google Speech API doesn't give correct result when audio is sent in file

4 août 2012, par Cupidvogel

I chanced upon the article at Google Speech API which suggested a mechanism for extracting text from audio file through Perl. Now I have recorded a audio file, which you will find at http://vocaroo.com/i/s0lPN5d3YQJj. It is a simple piece of audio, reading I love you. When I go to the Google speech API in Chrome, and speak those words, I get the right result. When I try the code at the above mentioned link with the audio file I pointed out, it returns strange results, like logan. How can I make it more accurate ? This is just a sample audio, what I am generally doing is extracting the audio from a video file through FFMpeg using something like ffmpeg -i input.avi -vn -ar 44100 -ac 2 -ab 192 -f mp3 output.mp3, followed by ffmpeg -i input.mp3 output.flac.

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

14 février 2018, par Josh

I have an rtmp stream of a video call and I want to transcribe it. I have created 2 services in Go and I’m getting results but it’s not very accurate and a lot of data seems to get lost.

Let me explain.

I have a transcode service, I use ffmpeg to transcode the video to Linear16 audio and place the output bytes onto a PubSub queue for a transcribe service to handle. Obviously there is a limit to the size of the PubSub message, and I want to start transcribing before the end of the video call. So, I chunk the transcoded data into 3 second clips (not fixed length, just seems about right) and put them onto the queue.

The data is transcoded quite simply :

var stdout Buffer



cmd := exec.Command("ffmpeg", "-i", url, "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-")

cmd.Stdout = &amp;stdout



if err := cmd.Start(); err != nil {

    log.Fatal(err)

}



ticker := time.NewTicker(3 * time.Second)



for {

    select {

    case &lt;-ticker.C:

        bytesConverted := stdout.Len()

        log.Infof("Converted %d bytes", bytesConverted)



        // Send the data we converted, even if there are no bytes.

        topic.Publish(ctx, &amp;pubsub.Message{

            Data: stdout.Bytes(),

        })



        stdout.Reset()

    }

}

The transcribe service pulls messages from the queue at a rate of 1 every 3 seconds, helping to process the audio data at about the same rate as it’s being created. There are limits on the Speech API stream, it can’t be longer than 60 seconds so I stop the old stream and start a new one every 30 seconds so we never hit the limit, no matter how long the video call lasts for.

This is how I’m transcribing it :

stream := prepareNewStream()

clipLengthTicker := time.NewTicker(30 * time.Second)

chunkLengthTicker := time.NewTicker(3 * time.Second)



cctx, cancel := context.WithCancel(context.TODO())

err := subscription.Receive(cctx, func(ctx context.Context, msg *pubsub.Message) {



    select {

    case &lt;-clipLengthTicker.C:

        log.Infof("Clip length reached.")

        log.Infof("Closing stream and starting over")



        err := stream.CloseSend()

        if err != nil {

            log.Fatalf("Could not close stream: %v", err)

        }



        go getResult(stream)

        stream = prepareNewStream()



    case &lt;-chunkLengthTicker.C:

        log.Infof("Chunk length reached.")



        bytesConverted := len(msg.Data)



        log.Infof("Received %d bytes\n", bytesConverted)



        if bytesConverted > 0 {

            if err := stream.Send(&amp;speechpb.StreamingRecognizeRequest{

                StreamingRequest: &amp;speechpb.StreamingRecognizeRequest_AudioContent{

                    AudioContent: transcodedChunk.Data,

                },

            }); err != nil {

                resp, _ := stream.Recv()

                log.Errorf("Could not send audio: %v", resp.GetError())

            }

        }



        msg.Ack()

    }

})

I think the problem is that my 3 second chunks don’t necessarily line up with starts and end of phrases or sentences so I suspect that the Speech API is a recurrent neural network which has been trained on full sentences rather than individual words. So starting a clip in the middle of a sentence loses some data because it can’t figure out the first few words up to the natural end of a phrase. Also, I lose some data in changing from an old stream to a new stream. There’s some context lost. I guess overlapping clips might help with this.

I have a couple of questions :

1) Does this architecture seem appropriate for my constraints (unknown length of audio stream, etc.) ?

2) What can I do to improve accuracy and minimise lost data ?

(Note I’ve simplified the examples for readability. Point out if anything doesn’t make sense because I’ve been heavy handed in cutting the examples down.)

Text backdrop by ass formatting

14 septembre 2024, par Armen Sanoyan
I want to add box behind a word using ass subtitles formatting. The box should have border radius. the ass file later will be used by ffmpeg.




I have tried the BorderStyle=3 form stack ansers 1, 2 both of them do not provide a way to get rounded boxes. Also the BorderStyle=4 didn't work for me. In comments of last stack answer I found a possible reason that my libraries can be old, but anyway it doesn't seem that BorderStyle=4 will solve my problem of border radius. There is another way to achieve rounded box link to answer. I didn't figure it out how to install all the libs he explained there. Also the later answer seems to me over complicated. Is there an other way to make the borders of box rounded without suffering and pain ? I also tried drawing the box with Drawing commands like



```
{\p1}m 0 0 s 100 0 100 100 0 100 c{\p0}&#xA;
```



But it still doesn't seem to be the best way to achieve rounded borders.

1 | ... | 720 | 721 | 722 | 723 | 724 | 725 | 726 | 727 | 728 | ... | 1175

Recherche avancée

Médias (0)

Autres articles (4)

Supporting all media types

Keeping control of your media in your hands

Le plugin : Podcasts.

Sur d’autres sites (3524)

Google Speech API doesn't give correct result when audio is sent in file

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

Text backdrop by ass formatting

Se connecter

Navigation

Syndication

Boussole SPIP