Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (0)

Mot : - Tags -/flash

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (30)

Les formats acceptés

28 janvier 2010, par kent1

Les commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
ffmpeg -codecs ffmpeg -formats
Les format videos acceptés en entrée
Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
Les formats vidéos de sortie possibles
Dans un premier temps on (...)
Ajouter notes et légendes aux images

7 février 2011, par kent1

Pour pouvoir ajouter notes et légendes aux images, la première étape est d’installer le plugin "Légendes".
Une fois le plugin activé, vous pouvez le configurer dans l’espace de configuration afin de modifier les droits de création / modification et de suppression des notes. Par défaut seuls les administrateurs du site peuvent ajouter des notes aux images.
Modification lors de l’ajout d’un média
Lors de l’ajout d’un média de type "image" un nouveau bouton apparait au dessus de la prévisualisation (...)
HTML5 audio and video support

13 avril 2011, par kent1

MediaSPIP uses HTML5 video and audio tags to play multimedia files, taking advantage of the latest W3C innovations supported by modern browsers.
The MediaSPIP player used has been created specifically for MediaSPIP and can be easily adapted to fit in with a specific theme.
For older browsers the Flowplayer flash fallback is used.
MediaSPIP allows for media playback on major mobile platforms with the above (...)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

Sur d’autres sites (4372)

Revision d115dbc24c : Adjust style to match Google Coding Style a little more closely. Most of these

30 octobre 2012, par Ronald S. Bultje

Changed Paths : Modify /vp8/common/onyx.h Modify /vp8/encoder/bitstream.c Modify /vp8/encoder/dct.c Modify /vp8/encoder/encodeframe.c Modify /vp8/encoder/encodeintra.c Modify /vp8/encoder/firstpass.c Modify /vp8/encoder/generic/csystemdependent.c (...)

Google Speech API "Sample rate in request does not match FLAC header"

13 février 2017, par kjdion84

I’m trying to convert an mp4 video clip into a FLAC audio file and then have google speech spit out the words from the video so that I can detect if specific words were said.

I have everything working except that I am getting an error from the Speech API :

{

  "error": {

    "code": 400,

    "message": "Sample rate in request does not match FLAC header.",

    "status": "INVALID_ARGUMENT"

  }

}

I am using FFMPEG in order to convert the mp4 into a FLAC file. I am specifying that the FLAC file be 16 bits in the command, but when I right click on the FLAC file Windows is telling me it is 302kbps.

Here is my PHP code :

// convert mp4 video to 16 bit flac audio file

$cmd = 'C:/wamp/www/ffmpeg/bin/ffmpeg.exe -i C:/wamp/www/test.mp4 -c:a flac -sample_fmt s16 C:/wamp/www/test.flac';

exec($cmd, $output);



// convert flac to text so we can detect if certain words were said

$data = array(

    "config" => array(

        "encoding" => "FLAC",

        "sampleRate" => 16000,

        "languageCode" => "en-US"

    ),

    "audio" => array(

        "content" => base64_encode(file_get_contents("test.flac")),

    )

);



$json_data = json_encode($data);



$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://speech.googleapis.com/v1beta1/speech:syncrecognize?key=MY_API_KEY');

curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: application/json"));

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

curl_setopt($ch, CURLOPT_POST, true);

curl_setopt($ch, CURLOPT_POSTFIELDS, $json_data);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);



$result = curl_exec($ch);

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

14 février 2018, par Josh

I have an rtmp stream of a video call and I want to transcribe it. I have created 2 services in Go and I’m getting results but it’s not very accurate and a lot of data seems to get lost.

Let me explain.

I have a transcode service, I use ffmpeg to transcode the video to Linear16 audio and place the output bytes onto a PubSub queue for a transcribe service to handle. Obviously there is a limit to the size of the PubSub message, and I want to start transcribing before the end of the video call. So, I chunk the transcoded data into 3 second clips (not fixed length, just seems about right) and put them onto the queue.

The data is transcoded quite simply :

var stdout Buffer



cmd := exec.Command("ffmpeg", "-i", url, "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-")

cmd.Stdout = &amp;stdout



if err := cmd.Start(); err != nil {

    log.Fatal(err)

}



ticker := time.NewTicker(3 * time.Second)



for {

    select {

    case &lt;-ticker.C:

        bytesConverted := stdout.Len()

        log.Infof("Converted %d bytes", bytesConverted)



        // Send the data we converted, even if there are no bytes.

        topic.Publish(ctx, &amp;pubsub.Message{

            Data: stdout.Bytes(),

        })



        stdout.Reset()

    }

}

The transcribe service pulls messages from the queue at a rate of 1 every 3 seconds, helping to process the audio data at about the same rate as it’s being created. There are limits on the Speech API stream, it can’t be longer than 60 seconds so I stop the old stream and start a new one every 30 seconds so we never hit the limit, no matter how long the video call lasts for.

This is how I’m transcribing it :

stream := prepareNewStream()

clipLengthTicker := time.NewTicker(30 * time.Second)

chunkLengthTicker := time.NewTicker(3 * time.Second)



cctx, cancel := context.WithCancel(context.TODO())

err := subscription.Receive(cctx, func(ctx context.Context, msg *pubsub.Message) {



    select {

    case &lt;-clipLengthTicker.C:

        log.Infof("Clip length reached.")

        log.Infof("Closing stream and starting over")



        err := stream.CloseSend()

        if err != nil {

            log.Fatalf("Could not close stream: %v", err)

        }



        go getResult(stream)

        stream = prepareNewStream()



    case &lt;-chunkLengthTicker.C:

        log.Infof("Chunk length reached.")



        bytesConverted := len(msg.Data)



        log.Infof("Received %d bytes\n", bytesConverted)



        if bytesConverted > 0 {

            if err := stream.Send(&amp;speechpb.StreamingRecognizeRequest{

                StreamingRequest: &amp;speechpb.StreamingRecognizeRequest_AudioContent{

                    AudioContent: transcodedChunk.Data,

                },

            }); err != nil {

                resp, _ := stream.Recv()

                log.Errorf("Could not send audio: %v", resp.GetError())

            }

        }



        msg.Ack()

    }

})

I think the problem is that my 3 second chunks don’t necessarily line up with starts and end of phrases or sentences so I suspect that the Speech API is a recurrent neural network which has been trained on full sentences rather than individual words. So starting a clip in the middle of a sentence loses some data because it can’t figure out the first few words up to the natural end of a phrase. Also, I lose some data in changing from an old stream to a new stream. There’s some context lost. I guess overlapping clips might help with this.

I have a couple of questions :

1) Does this architecture seem appropriate for my constraints (unknown length of audio stream, etc.) ?

2) What can I do to improve accuracy and minimise lost data ?

(Note I’ve simplified the examples for readability. Point out if anything doesn’t make sense because I’ve been heavy handed in cutting the examples down.)

1 | ... | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 | 208 | ... | 1458

Recherche avancée

Médias (0)

Autres articles (30)

Les formats acceptés

Ajouter notes et légendes aux images

HTML5 audio and video support

Sur d’autres sites (4372)

Revision d115dbc24c : Adjust style to match Google Coding Style a little more closely. Most of these

Google Speech API "Sample rate in request does not match FLAC header"

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

Se connecter

Navigation

Syndication

Boussole SPIP