
Recherche avancée
Médias (91)
-
#3 The Safest Place
16 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#4 Emo Creates
15 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#2 Typewriter Dance
15 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
#1 The Wires
11 octobre 2011, par
Mis à jour : Février 2013
Langue : English
Type : Audio
-
ED-ME-5 1-DVD
11 octobre 2011, par
Mis à jour : Octobre 2011
Langue : English
Type : Audio
-
Revolution of Open-source and film making towards open film making
6 octobre 2011, par
Mis à jour : Juillet 2013
Langue : English
Type : Texte
Autres articles (112)
-
Personnaliser en ajoutant son logo, sa bannière ou son image de fond
5 septembre 2013, parCertains thèmes prennent en compte trois éléments de personnalisation : l’ajout d’un logo ; l’ajout d’une bannière l’ajout d’une image de fond ;
-
Les autorisations surchargées par les plugins
27 avril 2010, parMediaspip core
autoriser_auteur_modifier() afin que les visiteurs soient capables de modifier leurs informations sur la page d’auteurs -
Personnaliser les catégories
21 juin 2013, parFormulaire de création d’une catégorie
Pour ceux qui connaissent bien SPIP, une catégorie peut être assimilée à une rubrique.
Dans le cas d’un document de type catégorie, les champs proposés par défaut sont : Texte
On peut modifier ce formulaire dans la partie :
Administration > Configuration des masques de formulaire.
Dans le cas d’un document de type média, les champs non affichés par défaut sont : Descriptif rapide
Par ailleurs, c’est dans cette partie configuration qu’on peut indiquer le (...)
Sur d’autres sites (11168)
-
How to use Google's Cloud Speech-to-Text REST API to transcribe a video
24 juillet 2018, par mrbI’d like to have the transcript of 2 people speaking in a video, but I get an empty response from the Cloud Speech-to-Text API
Approach :
I have a 56 minute video file containing a conversation between two people. I would like to have the transcript of that conversation, and I would like to use Google’s Cloud Speech-to-Text API to get that.
To save a little on my Google Cloud Storage I converted to video to audio first by using
mmpeg
.First I’d tried to figure out the audio codec by using the command below, and it looks like AAC.
ffmpeg -i video.mp4
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'videoplayback.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: isommp42
creation_time : 2015-12-30T08:17:14.000000Z
Duration: 00:56:03.99, start: 0.000000, bitrate: 362 kb/s
Stream #0:0(und): Video: h264 (Constrained Baseline) (avc1 / 0x31637661), yuv420p, 490x360 [SAR 1:1 DAR 49:36], 264 kb/s, 29.97 fps, 29.97 tbr, 30k tbn, 59.94 tbc (default)
Metadata:
handler_name : VideoHandler
Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 96 kb/s (default)
Metadata:
creation_time : 2015-12-30T08:17:31.000000Z
handler_name : IsoMedia File Produced by Google, 5-11-2011So I took that from the video by using :
ffmpeg -i video.mp4 -vn -acodec copy myaudio.aac
Details so far :
ffmpeg -i myaudio.aac
Outputs :Input #0, aac, from 'myaudio.aac':
Duration: 00:56:47.49, bitrate: 97 kb/s
Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 97 kb/sAfter that I converted it to opus because I’m told that opus is better
ffmpeg -i myaudio.aac -acodec libopus -b:a 97k -vbr on -compression_level 10 myaudio.opus
Info so far :
opusinfo myaudio.opus
User comments section follows...
encoder=Lavc58.18.100 libopus
Opus stream 1:
Pre-skip: 312
Playback gain: 0 dB
Channels: 2
Original sample rate: 48000Hz
Packet duration: 20.0ms (max), 20.0ms (avg), 20.0ms (min)
Page duration: 1000.0ms (max), 1000.0ms (avg), 1000.0ms (min)
Total data length: 29956714 bytes (overhead: 0.872%)
Playback length: 56m:03.990s
Average bitrate: 71.24 kb/s, w/o overhead: 70.62 kb/sI this point I uploaded the
myaudio.opus
to the Google Cloud Storage.curl POST 1
I started the speech recognition by doing a POST withcurl
:curl --request POST --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "OGG_OPUS", "sampleRateHertz": 48000, "languageCode": "en-US"}}'
Response :
{"name": "123456789"}
123456789 was not the actual value.curl GET 1
Now I wanted to have the results :curl --request GET --url 'https://speech.googleapis.com/v1/operations/123456789?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}'
This gave me the error :
Error : Unable to recognize speech, possible error in encoding or channel config. Please correct the config and retry the request.
So I updated the encoding configuration from
OGG_OPUS
toLINEAR16
.curl POST 2
Did the post again :curl --request POST --header "Content-Type: application/json" --url 'https://speech.googleapis.com/v1/speech:longrunningrecognize?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}' --data '{"audio": {"uri": "gs://{MY_BUCKET}/myaudio.opus"},"config": {"encoding": "LINEAR16", "sampleRateHertz": 48000, "languageCode": "en-US"}}'
Response :
{"name": "987654321"}
curl GET 2
curl --request GET --url 'https://speech.googleapis.com/v1/operations/987654321?fields=done%2Cerror%2Cmetadata%2Cname%2Cresponse&key={MY_API_KEY}'
Response :
{
"name": "987654321",
"metadata": {
"@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
"progressPercent": 100,
"startTime": "2018-06-08T11:01:24.596504Z",
"lastUpdateTime": "2018-06-08T11:01:51.825882Z"
},
"done": true
}The problem is that I don’t get the actual transcription. According the the documentation there should be a
response
key in the response containing the data.Since I’m kinda stuck here I’d like to know if I’m doing something completely wrong. I don’t have any technical or resource limitation so all suggestions are very welcome ! Also happy to change my approach.
Thanks in advance ! Cheers
-
Anomalie #3592 : Robots.txt.html et tests de compatibilité mobile de Google
25 novembre 2019, par b bPas certain que ça soit nécessaire de doublonner en ouvrant de nouveau ce ticket puisque le sujet est discuté par ici https://core.spip.net/issues/4103 :)
-
FFmpeg can't process videos with filenames containing emojis on Google Colab
25 novembre 2022, par athenaI mounted my Google Drive on Google Colab, inside the Drive, there's a video file with an emoji on it's filename (example :
20221124 [우리의식탁 W TABLE] 직접 기른 허브로 만들면 더 맛있는 허브포카치아 🌿 (8m2hNIEoXEw).mkv
).

!ffmpeg -i "/content/drive/MyDrive/DOWNLOAD/20221124 [우리의식탁 W TABLE] 직접 기른 허브로 만들면 더 맛있는 허브포카치아 🌿 (8m2hNIEoXEw).mkv"



Trying to run FFmpeg gives me this error :


---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
 in <module>
 3 video = "/content/drive/MyDrive/DOWNLOAD/20221124 [\u110B\u116E\u1105\u1175\u110B\u1174\u1109\u1175\u11A8\u1110\u1161\u11A8 W TABLE] \u110C\u1175\u11A8\u110C\u1165\u11B8 \u1100\u1175\u1105\u1173\u11AB \u1112\u1165\u1107\u1173\u1105\u1169 \u1106\u1161\u11AB\u1103\u1173\u11AF\u1106\u1167\u11AB \u1103\u1165 \u1106\u1161\u11BA\u110B\u1175\u11BB\u1102\u1173\u11AB \u1112\u1165\u1107\u1173\u1111\u1169\u110F\u1161\u110E\u1175\u110B\u1161 \uD83C\uDF3F (8m2hNIEoXEw).mkv" #@param {type: "string"}
 4 
----> 5 get_ipython().system('ffmpeg -i "$video" #-hide_banner')

4 frames
/usr/local/lib/python3.7/dist-packages/google/colab/_shell.py in system(self, *args, **kwargs)
 93 kwargs.update({'also_return_output': True})
 94 
---> 95 output = _system_commands._system_compat(self, *args, **kwargs) # pylint:disable=protected-access
 96 
 97 if pip_warn:

/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py in _system_compat(shell, cmd, also_return_output)
 435 # stack.
 436 result = _run_command(
--> 437 shell.var_expand(cmd, depth=2), clear_streamed_output=False)
 438 shell.user_ns['_exit_code'] = result.returncode
 439 if -result.returncode in _INTERRUPTED_SIGNALS:

/usr/local/lib/python3.7/dist-packages/google/colab/_system_commands.py in _run_command(cmd, clear_streamed_output)
 189 stdin=stdin,
 190 stderr=child_pty,
--> 191 close_fds=True)
 192 # The child PTY is only needed by the spawned process.
 193 os.close(child_pty)

/usr/lib/python3.7/subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors, text)
 798 c2pread, c2pwrite,
 799 errread, errwrite,
--> 800 restore_signals, start_new_session)
 801 except:
 802 # Cleanup if the child failed starting.

/usr/lib/python3.7/subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, restore_signals, start_new_session)
 1480 errread, errwrite,
 1481 errpipe_read, errpipe_write,
-> 1482 restore_signals, start_new_session, preexec_fn)
 1483 self._child_created = True
 1484 finally:

UnicodeEncodeError: 'utf-8' codec can't encode characters in position 131-132: surrogates not allowed
</module>


My Colab uses Python 3.7.15 and ffmpeg/ffprobe version N-109226-g2ad199ae31-20221125 (from https://github.com/BtbN/FFmpeg-Builds).


I tried searching similar issues as mine here, but most of the solutions are way beyond my knowledge, I'm not sure how to apply them to my use case.


I'll appreciate your help, thank you !