
Recherche avancée
Médias (2)
-
GetID3 - Bloc informations de fichiers
9 avril 2013, par
Mis à jour : Mai 2013
Langue : français
Type : Image
-
GetID3 - Boutons supplémentaires
9 avril 2013, par
Mis à jour : Avril 2013
Langue : français
Type : Image
Autres articles (102)
-
Personnaliser en ajoutant son logo, sa bannière ou son image de fond
5 septembre 2013, parCertains thèmes prennent en compte trois éléments de personnalisation : l’ajout d’un logo ; l’ajout d’une bannière l’ajout d’une image de fond ;
-
Amélioration de la version de base
13 septembre 2013Jolie sélection multiple
Le plugin Chosen permet d’améliorer l’ergonomie des champs de sélection multiple. Voir les deux images suivantes pour comparer.
Il suffit pour cela d’activer le plugin Chosen (Configuration générale du site > Gestion des plugins), puis de configurer le plugin (Les squelettes > Chosen) en activant l’utilisation de Chosen dans le site public et en spécifiant les éléments de formulaires à améliorer, par exemple select[multiple] pour les listes à sélection multiple (...) -
Ecrire une actualité
21 juin 2013, parPrésentez les changements dans votre MédiaSPIP ou les actualités de vos projets sur votre MédiaSPIP grâce à la rubrique actualités.
Dans le thème par défaut spipeo de MédiaSPIP, les actualités sont affichées en bas de la page principale sous les éditoriaux.
Vous pouvez personnaliser le formulaire de création d’une actualité.
Formulaire de création d’une actualité Dans le cas d’un document de type actualité, les champs proposés par défaut sont : Date de publication ( personnaliser la date de publication ) (...)
Sur d’autres sites (7027)
-
How to use Google's Cloud Speech-to-Text REST API to transcribe a video
24 juillet 2018, par mrbI’d like to have the transcript of 2 people speaking in a video, but I get an empty response from the Cloud Speech-to-Text API
Approach :
I have a 56 minute video file containing a conversation between two people. I would like to have the transcript of that conversation, and I would like to use Google’s Cloud Speech-to-Text API to get that.
To save a little on my Google Cloud Storage I converted to video to audio first by using
mmpeg
.First I’d tried to figure out the audio codec by using the command below, and it looks like AAC.
ffmpeg -i video.mp4
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'videoplayback.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
creation_time : 2015-12-30T08:17:14.000000Z
Duration: 00:56:03.99, start: 0.000000, bitrate: 362 kb/s
Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 490x360 [SAR 1:1 DAR 49:36], 264 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 96 kb/s (default)
Metadata:
creation_time : 2015-12-30T08:17:31.000000Z
handler_name : IsoMedia File Produced by Google, 5-11-2011So I took that from the video by using :
ffmpeg -i video.mp4 -vn -acodec copy myaudio.aac
Details so far :
ffmpeg -i myaudio.aac
Outputs :Input #0, aac, from 'myaudio.aac':
Duration: 00:56:47.49, bitrate: 97 kb/s
Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 97 kb/sAfter that I converted it to opus because I’m told that opus is better
ffmpeg -i myaudio.aac -acodec libopus -b:a 97k -vbr on -compression_level 10 myaudio.opus
Info so far :
opusinfo myaudio.opus
User comments section follows...
encoder=Lavc58.18.100 libopus
Opus stream 1:
Pre-skip: 312
Playback gain: 0 dB
Channels: 2
Original sample rate: 48000Hz
Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min)
Page duration: 1000.0ms (max), 1000.0ms (avg), 1000.0ms (min)
Total data length: 29956714 bytes (overhead: 0.872%)
Playback length: 56m:03.990s
Average bitrate: 71.24 kb/s, w/o overhead: 70.62 kb/sI this point I uploaded the
myaudio.opus
to the Google Cloud Storage.curl POST 1
I started the speech recognition by doing a POST withcurl
:curl --request POST --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "OGG_OPUS", "sampleRateHertz": 48000, "languageCode": "en-US"}}'
Response :
{"name": "123456789"}
123456789 was not the actual value.curl GET 1
Now I wanted to have the results :curl --request GET --url 'https://speech.googleapis.com/v1/operations/123456789?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}'
This gave me the error :
Error : Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.
So I updated the encoding configuration from
OGG_OPUS
toLINEAR16
.curl POST 2
Did the post again :curl --request POST --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "LINEAR16", "sampleRateHertz": 48000, "languageCode": "en-US"}}'
Response :
{"name": "987654321"}
curl GET 2
curl --request GET --url 'https://speech.googleapis.com/v1/operations/987654321?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}'
Response :
{
"name": "987654321",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2018-06-08T11:01:24.596504Z",
"lastUpdateTime": "2018-06-08T11:01:51.825882Z"
},
"done": true
}The problem is that I don’t get the actual transcription. According the the documentation there should be a
response
key in the response containing the data.Since I’m kinda stuck here I’d like to know if I’m doing something completely wrong. I don’t have any technical or resource limitation so all suggestions are very welcome ! Also happy to change my approach.
Thanks in advance ! Cheers
-
ffmpeg combining mp4 videos incorrectly
9 juin 2018, par wdsfdsi have a list of 20 mp4 files that I need to combine into one video. They are all 60 FPS, but range in size. In between each mp4 file I need to show its ’intro’, a 1.5 second clip about the file.
Reference :
pics/i.png
- intro for the nth mp4 fileclips/i_intro.mp4
- mp4 intro generated from i.pngclips/i.mp4
- original mp4clips/i_resize.mp4
- 720p mp4
Here is what I’ve done :
Convert image to mp4 using this command (taken from this post) :
ffmpeg -y -i pics/{i}.png -f lavfi -i anullsrc -r 60 -pix_fmt yuv420p -ac 2 -ar 44100 -video_track_timescale 90k -t 1.5 clips/{i}_intro.mp4
Convert
i.mp4
to 720p using this command (taken from this post) :ffmpeg -i clips/{i}.mp4 -c:v libx264 -crf 23 -preset medium -c:a aac -b:a 128k -vf scale=-2:720,format=yuv420p {i}_resize.mp4
Created a file like this :
file 1_intro.mp4
file 1_resize.mp4
file 2_intro.mp4
file 2_resize.mp4
...Combine all clips :
ffmpeg -f concat -i {video_list} -c copy out.mp4
This generates a file, but it is definitely messed up. When I combined the files I got a ton of warning messages like this :
[mp4 @ 0x7f88fe01de00] Non-monotonous DTS in output stream 0:0;
previous: 48966232, current: 8826932; changing to 48966233. This may
result in incorrect timestamps in the output file.FFprobe for
out.mp4
ffprobe version 4.0 Copyright (c) 2007-2018 the FFmpeg developers
built with Apple LLVM version 9.1.0 (clang-902.0.39.1)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'out.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.12.100
Duration: 00:09:35.56, start: 0.000000, bitrate: 4119 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 4221 kb/s, 59.98 fps, 351.56 tbr, 90k tbn, 120 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 122 kb/s (default)
Metadata:
handler_name : SoundHandlerFFprobe for
1.mp4
ffprobe version 4.0 Copyright (c) 2007-2018 the FFmpeg developers
built with Apple LLVM version 9.1.0 (clang-902.0.39.1)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf57.71.100
Duration: 00:00:40.94, start: 0.000000, bitrate: 7613 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc, bt709/unknown/unknown), 1920x1080 [SAR 1:1 DAR 16:9], 7451 kb/s, 60 fps, 60 tbr, 90k tbn, 120 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 160 kb/s (default)
Metadata:
handler_name : SoundHandlerFFprobe for
1_resized.mp4
ffprobe version 4.0 Copyright (c) 2007-2018 the FFmpeg developers
built with Apple LLVM version 9.1.0 (clang-902.0.39.1)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1_resize.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.12.100
Duration: 00:00:32.37, start: 0.000000, bitrate: 3266 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 3127 kb/s, 60 fps, 60 tbr, 15360 tbn, 120 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 129 kb/s (default)
Metadata:
handler_name : SoundHandlerFFprobe for
1_intro.mp4
ffprobe version 4.0 Copyright (c) 2007-2018 the FFmpeg developers
built with Apple LLVM version 9.1.0 (clang-902.0.39.1)
configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libmp3lame --enable-libx264 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma
libavutil 56. 14.100 / 56. 14.100
libavcodec 58. 18.100 / 58. 18.100
libavformat 58. 12.100 / 58. 12.100
libavdevice 58. 3.100 / 58. 3.100
libavfilter 7. 16.100 / 7. 16.100
libavresample 4. 0. 0 / 4. 0. 0
libswscale 5. 1.100 / 5. 1.100
libswresample 3. 1.100 / 3. 1.100
libpostproc 55. 1.100 / 55. 1.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '1_intro.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf58.12.100
Duration: 00:00:01.52, start: 0.000000, bitrate: 113 kb/s
Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 4667 kb/s, 60 fps, 60 tbr, 90k tbn, 120 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 2 kb/s (default)
Metadata:
handler_name : SoundHandlerI noticed that the length of the files are different, and the
tbn
in different, but idk what that means. Thanks for helping ! -
Converting a call center recording to something useful
9 août 2018, par AbhayI have a call center recording (when played it sounds gibberish) for which the mediainfo shows info as
ion@aurora:~/Inbound$ mediainfo 48401-3405-48403--18042018170000.wav
General
Complete name : 48401-3405-48403--18042018170000.wav
Format : Wave
File size : 327 KiB
Duration : 4mn 11s
Overall bit rate : 10.7 Kbps
Audio
Format : G.723.1
Codec ID : A100
Duration : 4mn 11s
Bit rate : 10.7 Kbps
Channel(s) : 2 channels
Sampling rate : 8 000 Hz
Stream size : 327 KiB (100%)The ffmpeg info shows this as
ion@aurora:~/Inbound$ ffmpeg -i 48401-3405-48403--18042018170000.wav
ffmpeg version N-91330-ga990184 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.9) 20160609
configuration: --prefix=/home/ion/ffmpeg_build --pkg-config-flags=--static --extra-cflags=-I/home/ion/ffmpeg_build/include --extra-ldflags=-L/home/ion/ffmpeg_build/lib --extra-libs='-lpthread -lm' --bindir=/home/ion/bin --enable-gpl --enable-libaom --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libopus --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libx265 --enable-nonfree
libavutil 56. 18.102 / 56. 18.102
libavcodec 58. 20.103 / 58. 20.103
libavformat 58. 17.100 / 58. 17.100
libavdevice 58. 4.101 / 58. 4.101
libavfilter 7. 25.100 / 7. 25.100
libswscale 5. 2.100 / 5. 2.100
libswresample 3. 2.100 / 3. 2.100
libpostproc 55. 2.100 / 55. 2.100
Input #0, wav, from '48401-3405-48403--18042018170000.wav':
Duration: 00:04:11.37, bitrate: 10 kb/s
Stream #0:0: Audio: g723_1 ([0][161][0][0] / 0xA100), 8000 Hz, mono, s16, 10 kb/s
At least one output file must be specifiedSo I converted this file to PCM using
ffmpeg -acodec g723_1 -i 48401-3405-48403--18042018170000.wav -acodec pcm_s16le -f wav outnew1.wav
But the audio still sound gibberish , I tried many variation and only Goldwave worked but that works on windows and with GUI not cli.
So how can I convert this file to something useful so that atleast I can listen to it , It feels like a challenge now.
Audio file : https://drive.google.com/open?id=1T54lKaI6IJmOqTPNOA_OkYRz89EQ5F2L
PS : Use VLC to play audio file