Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (91)

Valkaama DVD Cover Outside

4 octobre 2011, par kent1

Mis à jour : Octobre 2011

Langue : English

Type : Image

Tags : photoshop, psd, creative commons, opensource, open film making, Valkaama

1
2
3
4
5
Valkaama DVD Label

4 octobre 2011, par kent1

Mis à jour : Février 2013

Langue : English

Type : Image

Tags : image, psd, creative commons, doc2img, opensource, open film making, Valkaama

1
2
3
4
5
Valkaama DVD Cover Inside

4 octobre 2011, par kent1

Mis à jour : Octobre 2011

Langue : English

Type : Image

Tags : photoshop, psd, creative commons, opensource, open film making, Valkaama

1
2
3
4
5
1,000,000

27 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : English

Type : Audio

Tags : audio, Nine Inch Nails, Musique, mp3

1
2
3
4
5
Demon Seed

26 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : English

Type : Audio

Tags : audio, Nine Inch Nails, Musique, wav

1
2
3
4
5
The Four of Us are Dying

26 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : English

Type : Audio

Tags : audio, Nine Inch Nails, Musique, mp3

1
2
3
4
5

1 | ... | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | ... | 16

Sur d’autres sites (233)

How to synthesise speech using flite in ffmpeg [on hold]

22 août 2014, par Henry The Least
How do I add text to speech using flite in ffmpeg ?

In the ffmpeg command I have an input video and an input audio as well as drawtext. I can’t figure out how to add a speech from a predefined text that will be saved in the output file. I tried to add flite command as referenced in ffmpeg documentation : -f lavfi -i flite=text='this is a speech, but there is some problem with my syntax.

This is the command I use :
```
ffmpeg -i blank.mp4 -i image.jpg -b:v 1M -filter_complex "[1:v]scale=-1:360 [ovrl],[0:v][ovrl]overlay=(main_w-overlay_w)/2:(main_h-overlay_h)/2,drawtext=fontfile=/home/user/bin/Impact.ttf: text='Hi There!': x=(main_w/2-text_w/2): y=100:  borderw=1: bordercolor=white: fontsize=10" output.mp4
```

How to use Google's Cloud Speech-to-Text REST API to transcribe a video

24 juillet 2018, par mrb

I’d like to have the transcript of 2 people speaking in a video, but I get an empty response from the Cloud Speech-to-Text API

Approach :

I have a 56 minute video file containing a conversation between two people. I would like to have the transcript of that conversation, and I would like to use Google’s Cloud Speech-to-Text API to get that.

To save a little on my Google Cloud Storage I converted to video to audio first by using mmpeg.

First I’d tried to figure out the audio codec by using the command below, and it looks like AAC.
ffmpeg -i video.mp4

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'videoplayback.mp4':

  Metadata:

    major_brand     : mp42

    minor_version   : 0

    compatible_brands: isommp42

    creation_time   : 2015-12-30T08:17:14.000000Z

  Duration: 00:56:03.99, start: 0.000000, bitrate: 362 kb/s

    Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 490x360 [SAR 1:1 DAR 49:36], 264 kb/s,     29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)

    Metadata:

      handler_name    : VideoHandler

    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 96 kb/s (default)

    Metadata:

      creation_time   : 2015-12-30T08:17:31.000000Z

      handler_name    : IsoMedia File Produced by Google, 5-11-2011

So I took that from the video by using :
ffmpeg -i video.mp4 -vn -acodec copy myaudio.aac

Details so far :
ffmpeg -i myaudio.aac
Outputs :

Input #0, aac, from 'myaudio.aac':

  Duration: 00:56:47.49, bitrate: 97 kb/s

    Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 97 kb/s

After that I converted it to opus because I’m told that opus is better
ffmpeg -i myaudio.aac -acodec libopus -b:a 97k -vbr on -compression_level 10 myaudio.opus

Info so far :
opusinfo myaudio.opus

User comments section follows...

    encoder=Lavc58.18.100 libopus

Opus stream 1:

    Pre-skip: 312

    Playback gain: 0 dB

    Channels: 2

    Original sample rate: 48000Hz

    Packet duration:   20.0ms (max),   20.0ms (avg),   20.0ms (min)

    Page duration:   1000.0ms (max), 1000.0ms (avg), 1000.0ms (min)

    Total data length: 29956714 bytes (overhead: 0.872%)

    Playback length: 56m:03.990s

    Average bitrate: 71.24 kb/s, w/o overhead: 70.62 kb/s

I this point I uploaded the myaudio.opus to the Google Cloud Storage.

curl POST 1
I started the speech recognition by doing a POST with curl :

curl --request POST  --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&amp;key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "OGG_OPUS", "sampleRateHertz": 48000, "languageCode": "en-US"}}'

Response : {"name": "123456789"}
123456789 was not the actual value.

curl GET 1
Now I wanted to have the results :

curl --request GET --url 'https://speech.googleapis.com/v1/operations/123456789?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&amp;key={MY_API_KEY}'

This gave me the error : Error : Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.

So I updated the encoding configuration from OGG_OPUS to LINEAR16.

curl POST 2
Did the post again :

curl --request POST  --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&amp;key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "LINEAR16", "sampleRateHertz": 48000, "languageCode": "en-US"}}'

Response : {"name": "987654321"}

curl GET 2

curl --request GET --url 'https://speech.googleapis.com/v1/operations/987654321?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&amp;key={MY_API_KEY}'

Response :

{

  "name": "987654321",

  "metadata": {

    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",

    "progressPercent": 100,

    "startTime": "2018-06-08T11:01:24.596504Z",

    "lastUpdateTime": "2018-06-08T11:01:51.825882Z"

  },

  "done": true

}

The problem is that I don’t get the actual transcription. According the the documentation there should be a response key in the response containing the data.

Since I’m kinda stuck here I’d like to know if I’m doing something completely wrong. I don’t have any technical or resource limitation so all suggestions are very welcome ! Also happy to change my approach.

Thanks in advance ! Cheers

How to use Google's Cloud Speech-to-Text API to transcribe a video using the REST API

8 juin 2018, par mrb

I’d like to have the transcript of 2 people speaking in a video, but I get an empty response from the Cloud Speech-to-Text API

Approach :

To save a little on my Google Cloud Storage I converted to video to audio first by using mmpeg.

First I’d tried to figure out the audio codec by using the command below, and it looks like AAC.
ffmpeg -i video.mp4

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'videoplayback.mp4':

  Metadata:

    major_brand     : mp42

    minor_version   : 0

    compatible_brands: isommp42

    creation_time   : 2015-12-30T08:17:14.000000Z

  Duration: 00:56:03.99, start: 0.000000, bitrate: 362 kb/s

    Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 490x360 [SAR 1:1 DAR 49:36], 264 kb/s,     29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)

    Metadata:

      handler_name    : VideoHandler

    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 96 kb/s (default)

    Metadata:

      creation_time   : 2015-12-30T08:17:31.000000Z

      handler_name    : IsoMedia File Produced by Google, 5-11-2011

So I took that from the video by using :
ffmpeg -i video.mp4 -vn -acodec copy myaudio.aac

Details so far :
ffmpeg -i myaudio.aac
Outputs :

Input #0, aac, from 'myaudio.aac':

  Duration: 00:56:47.49, bitrate: 97 kb/s

    Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 97 kb/s

After that I converted it to opus because I’m told that opus is better
ffmpeg -i myaudio.aac -acodec libopus -b:a 97k -vbr on -compression_level 10 myaudio.opus

Info so far :
opusinfo myaudio.opus

User comments section follows...

    encoder=Lavc58.18.100 libopus

Opus stream 1:

    Pre-skip: 312

    Playback gain: 0 dB

    Channels: 2

    Original sample rate: 48000Hz

    Packet duration:   20.0ms (max),   20.0ms (avg),   20.0ms (min)

    Page duration:   1000.0ms (max), 1000.0ms (avg), 1000.0ms (min)

    Total data length: 29956714 bytes (overhead: 0.872%)

    Playback length: 56m:03.990s

    Average bitrate: 71.24 kb/s, w/o overhead: 70.62 kb/s

I this point I uploaded the myaudio.opus to the Google Cloud Storage.

curl POST 1
I started the speech recognition by doing a POST with curl :

curl --request POST  --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&amp;key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "OGG_OPUS", "sampleRateHertz": 48000, "languageCode": "en-US"}}'

Response : {"name": "123456789"}
123456789 was not the actual value.

curl GET 1
Now I wanted to have the results :

curl --request GET --url 'https://speech.googleapis.com/v1/operations/123456789?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&amp;key={MY_API_KEY}'

This gave me the error : Error : Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.

So I updated the encoding configuration from OGG_OPUS to LINEAR16.

curl POST 2
Did the post again :

curl --request POST  --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&amp;key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "LINEAR16", "sampleRateHertz": 48000, "languageCode": "en-US"}}'

Response : {"name": "987654321"}

curl GET 2

curl --request GET --url 'https://speech.googleapis.com/v1/operations/987654321?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&amp;key={MY_API_KEY}'

Response :

{

  "name": "987654321",

  "metadata": {

    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",

    "progressPercent": 100,

    "startTime": "2018-06-08T11:01:24.596504Z",

    "lastUpdateTime": "2018-06-08T11:01:51.825882Z"

  },

  "done": true

}

The problem is that I don’t get the actual transcription. According the the documentation there should be a response key in the response containing the data.

Thanks in advance ! Cheers

1 | ... | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ... | 78

Recherche avancée

Médias (91)

Valkaama DVD Cover Outside

Valkaama DVD Label

Valkaama DVD Cover Inside

1,000,000

Demon Seed

The Four of Us are Dying

Sur d’autres sites (233)

How to synthesise speech using flite in ffmpeg [on hold]

How to use Google's Cloud Speech-to-Text REST API to transcribe a video

How to use Google's Cloud Speech-to-Text API to transcribe a video using the REST API

Se connecter

Navigation

Syndication

Boussole SPIP