
Recherche avancée
Médias (91)
-
Head down (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
-
Echoplex (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
-
Discipline (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
-
Letting you (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
-
1 000 000 (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
-
999 999 (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
Autres articles (95)
-
Websites made with MediaSPIP
2 mai 2011, parThis page lists some websites based on MediaSPIP.
-
Amélioration de la version de base
13 septembre 2013Jolie sélection multiple
Le plugin Chosen permet d’améliorer l’ergonomie des champs de sélection multiple. Voir les deux images suivantes pour comparer.
Il suffit pour cela d’activer le plugin Chosen (Configuration générale du site > Gestion des plugins), puis de configurer le plugin (Les squelettes > Chosen) en activant l’utilisation de Chosen dans le site public et en spécifiant les éléments de formulaires à améliorer, par exemple select[multiple] pour les listes à sélection multiple (...) -
Emballe médias : à quoi cela sert ?
4 février 2011, parCe plugin vise à gérer des sites de mise en ligne de documents de tous types.
Il crée des "médias", à savoir : un "média" est un article au sens SPIP créé automatiquement lors du téléversement d’un document qu’il soit audio, vidéo, image ou textuel ; un seul document ne peut être lié à un article dit "média" ;
Sur d’autres sites (15016)
-
How to Adjust Google TTS SSML to Match Original SRT Timing ?
2 avril, par Alexandre SilkinI have an .srt file where each speech segment is supposed to last a specific duration (e.g., 4 seconds). However, when I generate the speech using Google Text-to-Speech (TTS) with SSML, the resulting audio plays the same segment in a shorter time (e.g., 3 seconds).


I want to adjust the speech rate dynamically in SSML so that each segment matches its original timing. My idea is to use ffmpeg to extract the actual duration of each generated speech segment, then calculate the speech rate percentage as :
generated duration
speech rate = --------------------
original duration


This percentage would then be applied in SSML using the tag, like :
Text to be spoken


How can I accurately measure the duration of each segment using ffmpeg, and what is the best way to apply the correct speech rate in SSML to match the original .srt timing ?


I tried duration and my SSML should look like this :


f.write(f'\t<p>{break_until_start}{text}<break time="{value["></break></p>\n')



Code writing the SSML :


text = value['text']
start_time_ms = int(value['start_ms']) # Start time in milliseconds
previous_end_ms = int(subsDict.get(str(int(key) - 1), {}).get('end_ms', 0)) # Get the previous end time
gap_to_fill = max(0, start_time_ms - previous_end_ms)


text = text.replace("&", "&amp;").replace('"', "&quot;").replace("'", "&apos;").replace("<", "&lt;").replace(
 ">", "&gt;")

 break_until_start = f'<break time="{gap_to_fill}ms"></break>' if gap_to_fill > 0 else ''

 f.write(f'\t<p>{break_until_start}{text}<break time="{value["></break></p>\n')

 f.write('\n')



-
lavu/tx : require output argument to match input for inplace transforms
26 février 2021, par Lynne -
How to re-encode an audio to match another one, to avoid re-encoding the whole audio
21 mars 2024, par Bernard WiesnerI have an audio editor in the browser using ffmpeg (WebAssembly), and I want to insert new audio into the existing audio without having to re-encode everything. Re-encoding everything takes a long time, especially in the browser, so I would like to only re-encode the inserted file, match it to the original one and concatenate them using the
copy
command.

On ffmpeg concatenate docs it says :




All files must have the same streams (same codecs, same time base, etc.)




But it is not clear what is meant by time base. So far I have observed I need to match :


- 

- codec
- bit rate
- sample rate
- channels (mono, stereo)










Is there anything else I need to match so that the resulting audio is not corrupt/broken when concatenating ?


I have observed with mp3 for example it has VBR, CBR, and ABR. If the original audio has a bit rate of 128 kb/s, I am assuming it is a CBR, so I match it with :


ffmpeg -i original.mp3
# > Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s

ffmpeg -i input.mp3 -b:a 128k -ar 44100 -ac 2 re_encoded.mp3

# then merge
# concat_list.txt contains the original audio and the re_encoded.mp3

ffmpeg -f concat -i concat_list.txt -safe 0 -c copy merged.mp3



And that works fine for CBR such as 8, 16, 24, 32, 40, 48, 64, 80, 96, 112, 128, 160, 192, 224, 256, or 320 (docs), as far as I have tested.


The issue is when the original.mp3 has a VBR (variable bit rate) or ABR, such as 150 kb/s.


If I try to match it like below :


ffmpeg -i input.mp3 -b:a 150k -ar 44100 -ac 2 re_encoded.mp3
ffmpeg -i re_encoded.mp3
# Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 160 kb/s



The resulting bitrate is rounded to the nearest CBR which is 160.


I can solve this with mp3 by using
-abr 1
:

ffmpeg -i input.mp3 -abr 1 -b:a 150k -ar 44100 -ac 2 re_encoded.mp3
ffmpeg -i re_encoded.mp3
# Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 150 kb/s



Now the bitrate matches the original audio, however I am not sure this is correct since I am modifying the new audio to an ABR and concatenating it with a VBR ? I am not even sure how to check with ffmpeg if the audio is VBR, CBR or ABR, or if that even matters when concatenating.


Another issue also happens with aac files. When I try to match the original audio bitrate I can't.


ffmpeg -i input.mp3 -b:a 128k -ar 44100 -ac 2 re_encoded.aac
ffmpeg -i re_encoded.aac
# Stream #0:0: Audio: aac (LC), 44100 Hz, stereo, fltp, 135 kb/s



The resulting bitrate always seems to be variable (135 in this case), and hence I can't match it to the original one.


So my question is, what conditions need to be met when concatenating audios with different streams, and how can I achieve re-encoding only one audio to match the other one. Or if there is some package that can do this, it would be of great help.