
Recherche avancée
Médias (1)
-
1 000 000 (wav version)
26 septembre 2011, par
Mis à jour : Avril 2013
Langue : English
Type : Audio
Autres articles (41)
-
Les autorisations surchargées par les plugins
27 avril 2010, parMediaspip core
autoriser_auteur_modifier() afin que les visiteurs soient capables de modifier leurs informations sur la page d’auteurs -
Supporting all media types
13 avril 2011, parUnlike most software and media-sharing platforms, MediaSPIP aims to manage as many different media types as possible. The following are just a few examples from an ever-expanding list of supported formats : images : png, gif, jpg, bmp and more audio : MP3, Ogg, Wav and more video : AVI, MP4, OGV, mpg, mov, wmv and more text, code and other data : OpenOffice, Microsoft Office (Word, PowerPoint, Excel), web (html, CSS), LaTeX, Google Earth and (...)
-
Encoding and processing into web-friendly formats
13 avril 2011, parMediaSPIP automatically converts uploaded files to internet-compatible formats.
Video files are encoded in MP4, Ogv and WebM (supported by HTML5) and MP4 (supported by Flash).
Audio files are encoded in MP3 and Ogg (supported by HTML5) and MP3 (supported by Flash).
Where possible, text is analyzed in order to retrieve the data needed for search engine detection, and then exported as a series of image files.
All uploaded files are stored online in their original format, so you can (...)
Sur d’autres sites (6942)
-
Start and end time of MoviePy's VideoClip not working
21 mars 2024, par ernesto casco velazquezI'm trying to add captions to a video. The desired outcome is to show each word in the exact moment is being said.


I have a method that gives me the accurate time start and end per each word :


def get_words_per_time(audio_speech_file):
 model = whisper.load_model("base")
 transcribe = model.transcribe(
 audio=audio_speech_file, fp16=False, word_timestamps=True
 )
 segments = transcribe["segments"]
 words = []

 for seg in segments:
 for word in seg["words"]:
 words.append(
 {
 "word": word["word"],
 "start": word["start"],
 "end": word["end"],
 "prob": round(word["probability"], 4),
 }
 )
 return words



Then I have a code that uses MoviePy to create TextClip and assing a given start and end time per pair of words (I know there are redundant statements, srry) :


def generate_captions(
 words,
 font="Komika",
 fontsize=32,
 color="White",
 align="center",
 stroke_width=3,
 stroke_color="black",
):
 text_comp = []
 for i in track(range(0, len(words), 2), description="Creating captions..."):
 word1 = words[i]
 if i + 1 < len(words):
 word2 = words[i + 1]
 text_clip = TextClip(
 f"{word1['word']} {word2['word'] if i + 1 < len(words) else ''}",
 font=font, # Change Font if not found
 fontsize=fontsize,
 color=color,
 align=align,
 method="caption",
 size=(660, None),
 stroke_width=stroke_width,
 stroke_color=stroke_color,
 )
 text_clip = text_clip.set_start(word1["start"])
 text_clip = text_clip.set_end(
 word2["end"] if i + 1 < len(words) else word1["end"]
 )
 text_comp.append(text_clip)
 return text_comp



Finally, I concatenate the words into a single video :


vid_clip = CompositeVideoClip(
 [vid_clip, concatenate_videoclips(text_comp).set_position(("center", 860))]
)



The output is this, but you can clearly see the words are not flowing with the speech. They somehow move faster as if the start/end time did not matter. Here's the video


The words with their respective start/end time, look like this :


[
 {
 'word': 'This',
 'start': 0.0,
 'end': 0.22,
 'prob': 0.805
 },
 {
 'word': 'is',
 'start': 0.22,
 'end': 0.42,
 'prob': 0.9991
 },
 {
 'word': 'a',
 'start': 0.42,
 'end': 0.6,
 'prob': 0.999
 },
 {
 'word': 'test,
 ',
 'start': 0.6,
 'end': 1.04,
 'prob': 0.9939
 },
 {
 'word': 'to',
 'start': 1.18,
 'end': 1.3,
 'prob': 0.9847
 },
 {
 'word': 'show',
 'start': 1.3,
 'end': 1.54,
 'prob': 0.9971
 },
 {
 'word': 'words',
 'start': 1.54,
 'end': 1.9,
 'prob': 0.995
 },
 {
 'word': 'does',
 'start': 1.9,
 'end': 2.16,
 'prob': 0.997
 },
 {
 'word': 'not',
 'start': 2.16,
 'end': 2.4,
 'prob': 0.9978
 },
 {
 'word': 'appear.',
 'start': 2.4,
 'end': 2.82,
 'prob': 0.9984
 },
 {
 'word': 'At',
 'start': 3.46,
 'end': 3.6,
 'prob': 0.9793
 },
 {
 'word': 'their',
 'start': 3.6,
 'end': 3.8,
 'prob': 0.9984
 },
 {
 'word': 'proper',
 'start': 3.8,
 'end': 4.22,
 'prob': 0.9976
 },
 {
 'word': 'time.',
 'start': 4.22,
 'end': 4.72,
 'prob': 0.999
 },
 {
 'word': 'Thanks',
 'start': 5.04,
 'end': 5.4,
 'prob': 0.9662
 },
 {
 'word': 'for,
 ',
 'start': 5.4,
 'end': 5.66,
 'prob': 0.9941
 },
 {
 'word': 'watching.',
 'start': 5.94,
 'end': 6.36,
 'prob': 0.7701
 }
]



What could be causing this ?


-
ffmpeg silenceremove - hear what bits are removed
7 avril 2020, par jimoffmpeg silenceremove is pretty cool. im loving it. i can trim 3 second silences to 2 seconds and reduce a 1.5 hour file of spoken audio down 3 or 4 minutes (depending on the speaker).



once in a while I do hear my choice for stop_threshold (ie-40dB on audio only analog file) does cause the end of a word to be clipped, just here and there when the speaker trails off softly at the end of the word.



is there any way to output what is trimmed to a file ? so I can listen to it and get an idea of just how often this word clipping happens ?



thanks !


-
Anomalie #2244 : association fichiers zip - nom fichier long - css privé
23 mars 2012, par cedric -#grml... c’est le word-wrap:break-word qui est pas appliqué chez toi, ou qui est pas pris en compte, ou qui fait rien sur ce cas là ? où alors un max-width manquant ? Parceque hein ça marchait au moment du patch... :(