Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (3)

Mot : - Tags -/Valkaama

Valkaama DVD Cover Outside

4 octobre 2011, par kent1

Mis à jour : Octobre 2011

Langue : English

Type : Image

Tags : photoshop, psd, creative commons, opensource, open film making, Valkaama

1
2
3
4
5
Valkaama DVD Label

4 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Image

Tags : image, psd, creative commons, doc2img, opensource, open film making, Valkaama

1
2
3
4
5
Valkaama DVD Cover Inside

4 octobre 2011, par kent1

Mis à jour : Octobre 2011

Langue : English

Type : Image

Tags : photoshop, psd, creative commons, opensource, open film making, Valkaama

1
2
3
4
5

Autres articles (72)

Qu’est ce qu’un éditorial

21 juin 2013, par etalarma

Ecrivez votre de point de vue dans un article. Celui-ci sera rangé dans une rubrique prévue à cet effet.
Un éditorial est un article de type texte uniquement. Il a pour objectif de ranger les points de vue dans une rubrique dédiée. Un seul éditorial est placé à la une en page d’accueil. Pour consulter les précédents, consultez la rubrique dédiée.
Vous pouvez personnaliser le formulaire de création d’un éditorial.
Formulaire de création d’un éditorial Dans le cas d’un document de type éditorial, les (...)
Le profil des utilisateurs

12 avril 2011, par kent1

Chaque utilisateur dispose d’une page de profil lui permettant de modifier ses informations personnelle. Dans le menu de haut de page par défaut, un élément de menu est automatiquement créé à l’initialisation de MediaSPIP, visible uniquement si le visiteur est identifié sur le site.
L’utilisateur a accès à la modification de profil depuis sa page auteur, un lien dans la navigation "Modifier votre profil" est (...)
Configurer la prise en compte des langues

15 novembre 2010, par kent1

Accéder à la configuration et ajouter des langues prises en compte
Afin de configurer la prise en compte de nouvelles langues, il est nécessaire de se rendre dans la partie "Administrer" du site.
De là, dans le menu de navigation, vous pouvez accéder à une partie "Gestion des langues" permettant d’activer la prise en compte de nouvelles langues.
Chaque nouvelle langue ajoutée reste désactivable tant qu’aucun objet n’est créé dans cette langue. Dans ce cas, elle devient grisée dans la configuration et (...)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 24

Sur d’autres sites (8865)

Revision d115dbc24c : Adjust style to match Google Coding Style a little more closely. Most of these

30 octobre 2012, par Ronald S. Bultje

Changed Paths : Modify /vp8/common/onyx.h Modify /vp8/encoder/bitstream.c Modify /vp8/encoder/dct.c Modify /vp8/encoder/encodeframe.c Modify /vp8/encoder/encodeintra.c Modify /vp8/encoder/firstpass.c Modify /vp8/encoder/generic/csystemdependent.c (...)

Google Speech API "Sample rate in request does not match FLAC header"

13 février 2017, par kjdion84

I’m trying to convert an mp4 video clip into a FLAC audio file and then have google speech spit out the words from the video so that I can detect if specific words were said.

I have everything working except that I am getting an error from the Speech API :

{

  "error": {

    "code": 400,

    "message": "Sample rate in request does not match FLAC header.",

    "status": "INVALID_ARGUMENT"

  }

}

I am using FFMPEG in order to convert the mp4 into a FLAC file. I am specifying that the FLAC file be 16 bits in the command, but when I right click on the FLAC file Windows is telling me it is 302kbps.

Here is my PHP code :

// convert mp4 video to 16 bit flac audio file

$cmd = 'C:/wamp/www/ffmpeg/bin/ffmpeg.exe -i C:/wamp/www/test.mp4 -c:a flac -sample_fmt s16 C:/wamp/www/test.flac';

exec($cmd, $output);



// convert flac to text so we can detect if certain words were said

$data = array(

    "config" => array(

        "encoding" => "FLAC",

        "sampleRate" => 16000,

        "languageCode" => "en-US"

    ),

    "audio" => array(

        "content" => base64_encode(file_get_contents("test.flac")),

    )

);



$json_data = json_encode($data);



$ch = curl_init();

curl_setopt($ch, CURLOPT_URL, 'https://speech.googleapis.com/v1beta1/speech:syncrecognize?key=MY_API_KEY');

curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: application/json"));

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

curl_setopt($ch, CURLOPT_POST, true);

curl_setopt($ch, CURLOPT_POSTFIELDS, $json_data);

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);



$result = curl_exec($ch);

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

14 février 2018, par Josh

I have an rtmp stream of a video call and I want to transcribe it. I have created 2 services in Go and I’m getting results but it’s not very accurate and a lot of data seems to get lost.

Let me explain.

I have a transcode service, I use ffmpeg to transcode the video to Linear16 audio and place the output bytes onto a PubSub queue for a transcribe service to handle. Obviously there is a limit to the size of the PubSub message, and I want to start transcribing before the end of the video call. So, I chunk the transcoded data into 3 second clips (not fixed length, just seems about right) and put them onto the queue.

The data is transcoded quite simply :

var stdout Buffer



cmd := exec.Command("ffmpeg", "-i", url, "-f", "s16le", "-acodec", "pcm_s16le", "-ar", "16000", "-ac", "1", "-")

cmd.Stdout = &amp;stdout



if err := cmd.Start(); err != nil {

    log.Fatal(err)

}



ticker := time.NewTicker(3 * time.Second)



for {

    select {

    case &lt;-ticker.C:

        bytesConverted := stdout.Len()

        log.Infof("Converted %d bytes", bytesConverted)



        // Send the data we converted, even if there are no bytes.

        topic.Publish(ctx, &amp;pubsub.Message{

            Data: stdout.Bytes(),

        })



        stdout.Reset()

    }

}

The transcribe service pulls messages from the queue at a rate of 1 every 3 seconds, helping to process the audio data at about the same rate as it’s being created. There are limits on the Speech API stream, it can’t be longer than 60 seconds so I stop the old stream and start a new one every 30 seconds so we never hit the limit, no matter how long the video call lasts for.

This is how I’m transcribing it :

stream := prepareNewStream()

clipLengthTicker := time.NewTicker(30 * time.Second)

chunkLengthTicker := time.NewTicker(3 * time.Second)



cctx, cancel := context.WithCancel(context.TODO())

err := subscription.Receive(cctx, func(ctx context.Context, msg *pubsub.Message) {



    select {

    case &lt;-clipLengthTicker.C:

        log.Infof("Clip length reached.")

        log.Infof("Closing stream and starting over")



        err := stream.CloseSend()

        if err != nil {

            log.Fatalf("Could not close stream: %v", err)

        }



        go getResult(stream)

        stream = prepareNewStream()



    case &lt;-chunkLengthTicker.C:

        log.Infof("Chunk length reached.")



        bytesConverted := len(msg.Data)



        log.Infof("Received %d bytes\n", bytesConverted)



        if bytesConverted > 0 {

            if err := stream.Send(&amp;speechpb.StreamingRecognizeRequest{

                StreamingRequest: &amp;speechpb.StreamingRecognizeRequest_AudioContent{

                    AudioContent: transcodedChunk.Data,

                },

            }); err != nil {

                resp, _ := stream.Recv()

                log.Errorf("Could not send audio: %v", resp.GetError())

            }

        }



        msg.Ack()

    }

})

I think the problem is that my 3 second chunks don’t necessarily line up with starts and end of phrases or sentences so I suspect that the Speech API is a recurrent neural network which has been trained on full sentences rather than individual words. So starting a clip in the middle of a sentence loses some data because it can’t figure out the first few words up to the natural end of a phrase. Also, I lose some data in changing from an old stream to a new stream. There’s some context lost. I guess overlapping clips might help with this.

I have a couple of questions :

1) Does this architecture seem appropriate for my constraints (unknown length of audio stream, etc.) ?

2) What can I do to improve accuracy and minimise lost data ?

(Note I’ve simplified the examples for readability. Point out if anything doesn’t make sense because I’ve been heavy handed in cutting the examples down.)

1 | ... | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | ... | 2955

Recherche avancée

Médias (3)

Valkaama DVD Cover Outside

Valkaama DVD Label

Valkaama DVD Cover Inside

Autres articles (72)

Qu’est ce qu’un éditorial

Le profil des utilisateurs

Configurer la prise en compte des langues

Sur d’autres sites (8865)

Revision d115dbc24c : Adjust style to match Google Coding Style a little more closely. Most of these

Google Speech API "Sample rate in request does not match FLAC header"

Google Speech API + Go - Transcribing Audio Stream of Unknown Length

Se connecter

Navigation

Syndication

Boussole SPIP