Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (91)

Spoon - Revenge !

15 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : English

Type : Audio

Tags : creative commons, wired, audio

1
2
3
4
5
My Morning Jacket - One Big Holiday

15 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : English

Type : Audio

Tags : creative commons, wired, audio

1
2
3
4
5
Zap Mama - Wadidyusay ?

15 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : English

Type : Audio

Tags : creative commons, wired, audio

1
2
3
4
5
David Byrne - My Fair Lady

15 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : English

Type : Audio

1 commentaire

Tags : creative commons, wired, audio

1
2
3
4
5
Beastie Boys - Now Get Busy

15 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : English

Type : Audio

Tags : creative commons, wired, audio

1
2
3
4
5
Granite de l’Aber Ildut

9 septembre 2011, par kent1

Mis à jour : Septembre 2011

Langue : français

Type : Texte

Tags : kml, carte

1
2
3
4
5

1 | ... | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16

Autres articles (74)

HTML5 audio and video support

13 avril 2011, par kent1

MediaSPIP uses HTML5 video and audio tags to play multimedia files, taking advantage of the latest W3C innovations supported by modern browsers.
The MediaSPIP player used has been created specifically for MediaSPIP and can be easily adapted to fit in with a specific theme.
For older browsers the Flowplayer flash fallback is used.
MediaSPIP allows for media playback on major mobile platforms with the above (...)
Support audio et vidéo HTML5

10 avril 2011

MediaSPIP utilise les balises HTML5 video et audio pour la lecture de documents multimedia en profitant des dernières innovations du W3C supportées par les navigateurs modernes.
Pour les navigateurs plus anciens, le lecteur flash Flowplayer est utilisé.
Le lecteur HTML5 utilisé a été spécifiquement créé pour MediaSPIP : il est complètement modifiable graphiquement pour correspondre à un thème choisi.
Ces technologies permettent de distribuer vidéo et son à la fois sur des ordinateurs conventionnels (...)
De l’upload à la vidéo finale [version standalone]

31 janvier 2010, par kent1

Le chemin d’un document audio ou vidéo dans SPIPMotion est divisé en trois étapes distinctes.
Upload et récupération d’informations de la vidéo source
Dans un premier temps, il est nécessaire de créer un article SPIP et de lui joindre le document vidéo "source".
Au moment où ce document est joint à l’article, deux actions supplémentaires au comportement normal sont exécutées : La récupération des informations techniques des flux audio et video du fichier ; La génération d’une vignette : extraction d’une (...)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | ... | 25

Sur d’autres sites (9257)

Computer crashing when using python tools in same script

5 février 2023, par SL1997

I am attempting to use the speech recognition toolkit VOSK and the speech diarization package Resemblyzer to transcibe audio and then identify the speakers in the audio.

Tools :

https://github.com/alphacep/vosk-api

https://github.com/resemble-ai/Resemblyzer

I can do both things individually but run into issues when trying to do them when running the one python script.

I used the following guide when setting up the diarization system :

https://medium.com/saarthi-ai/who-spoke-when-build-your-own-speaker-diarization-module-from-scratch-e7d725ee279

Computer specs are as follows :

Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz, 3912 Mhz, 2 Core(s), 4 Logical Processor(s)

32GB RAM

The following is my code, I am not to sure if using threading is appropriate or if I even implemented it correctly, how can I best optimize this code as to achieve the results I am looking for and not crash.

from vosk import Model, KaldiRecognizer&#xA;from pydub import AudioSegment&#xA;import json&#xA;import sys&#xA;import os&#xA;import subprocess&#xA;import datetime&#xA;from resemblyzer import preprocess_wav, VoiceEncoder&#xA;from pathlib import Path&#xA;from resemblyzer.hparams import sampling_rate&#xA;from spectralcluster import SpectralClusterer&#xA;import threading&#xA;import queue&#xA;import gc&#xA;&#xA;&#xA;&#xA;def recognition(queue, audio, FRAME_RATE):&#xA;&#xA;    model = Model("Vosk_Models/vosk-model-small-en-us-0.15")&#xA;&#xA;    rec = KaldiRecognizer(model, FRAME_RATE)&#xA;    rec.SetWords(True)&#xA;&#xA;    rec.AcceptWaveform(audio.raw_data)&#xA;    result = rec.Result()&#xA;&#xA;    transcript = json.loads(result)#["text"]&#xA;&#xA;    #return transcript&#xA;    queue.put(transcript)&#xA;&#xA;&#xA;&#xA;def diarization(queue, audio):&#xA;&#xA;    wav = preprocess_wav(audio)&#xA;    encoder = VoiceEncoder("cpu")&#xA;    _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)&#xA;    print(cont_embeds.shape)&#xA;&#xA;    clusterer = SpectralClusterer(&#xA;        min_clusters=2,&#xA;        max_clusters=100,&#xA;        p_percentile=0.90,&#xA;        gaussian_blur_sigma=1)&#xA;&#xA;    labels = clusterer.predict(cont_embeds)&#xA;&#xA;    def create_labelling(labels, wav_splits):&#xA;&#xA;        times = [((s.start &#x2B; s.stop) / 2) / sampling_rate for s in wav_splits]&#xA;        labelling = []&#xA;        start_time = 0&#xA;&#xA;        for i, time in enumerate(times):&#xA;            if i > 0 and labels[i] != labels[i - 1]:&#xA;                temp = [str(labels[i - 1]), start_time, time]&#xA;                labelling.append(tuple(temp))&#xA;                start_time = time&#xA;            if i == len(times) - 1:&#xA;                temp = [str(labels[i]), start_time, time]&#xA;                labelling.append(tuple(temp))&#xA;&#xA;        return labelling&#xA;&#xA;    #return&#xA;    labelling = create_labelling(labels, wav_splits)&#xA;    queue.put(labelling)&#xA;&#xA;&#xA;&#xA;def identify_speaker(queue1, queue2):&#xA;&#xA;    transcript = queue1.get()&#xA;    labelling = queue2.get()&#xA;&#xA;    for speaker in labelling:&#xA;&#xA;        speakerID = speaker[0]&#xA;        speakerStart = speaker[1]&#xA;        speakerEnd = speaker[2]&#xA;&#xA;        result = transcript[&#x27;result&#x27;]&#xA;        words = [r[&#x27;word&#x27;] for r in result if speakerStart &lt; r[&#x27;start&#x27;] &lt; speakerEnd]&#xA;        #return&#xA;        print("Speaker",speakerID,":",&#x27; &#x27;.join(words), "\n")&#xA;&#xA;&#xA;&#xA;&#xA;&#xA;def main():&#xA;&#xA;    queue1 = queue.Queue()&#xA;    queue2 = queue.Queue()&#xA;&#xA;    FRAME_RATE = 16000&#xA;    CHANNELS = 1&#xA;&#xA;    podcast = AudioSegment.from_mp3("Podcast_Audio/Film-Release-Clip.mp3")&#xA;    podcast = podcast.set_channels(CHANNELS)&#xA;    podcast = podcast.set_frame_rate(FRAME_RATE)&#xA;&#xA;    first_thread = threading.Thread(target=recognition, args=(queue1, podcast, FRAME_RATE))&#xA;    second_thread = threading.Thread(target=diarization, args=(queue2, podcast))&#xA;    third_thread = threading.Thread(target=identify_speaker, args=(queue1, queue2))&#xA;&#xA;    first_thread.start()&#xA;    first_thread.join()&#xA;    gc.collect()&#xA;&#xA;    second_thread.start()&#xA;    second_thread.join()&#xA;    gc.collect()&#xA;&#xA;    third_thread.start()&#xA;    third_thread.join()&#xA;    gc.collect()&#xA;&#xA;    # transcript = recognition(podcast,FRAME_RATE)&#xA;    #&#xA;    # labelling = diarization(podcast)&#xA;    #&#xA;    # print(identify_speaker(transcript, labelling))&#xA;&#xA;&#xA;if __name__ == &#x27;__main__&#x27;:&#xA;    main()&#xA;

When I say crash I mean everything freezes, I have to hold down the power button on the desktop and turn it back on again. No blue/blank screen, just frozen in my IDE looking at my code. Any help in resolving this issue would be greatly appreciated.

Google’s YouTube Uses FFmpeg

9 février 2011, par Multimedia Mike — General
Controversy arose last week when Google accused Microsoft of stealing search engine results for their Bing search engine. It was a pretty novel sting operation and Google did a good job of visually illustrating their side of the story on their official blog.

This reminds me of the fact that Google’s YouTube video hosting site uses FFmpeg for converting videos. Not that this is in the same league as the search engine shenanigans (it’s perfectly legit to use FFmpeg in this capacity, but to my knowledge, Google/YouTube has never confirmed FFmpeg usage), but I thought I would revisit this item and illustrate it with screenshots. This is not new information— I first empirically tested this fact 4 years ago. However, a lot of people wonder how exactly I can identify FFmpeg on the backend when I claim that I’ve written code that helps power YouTube.

Short Answer
How do I know YouTube uses FFmpeg to convert multimedia ? Because :
1. FFmpeg can decode a number of impossibly obscure multimedia formats using code I wrote
2. YouTube can transcode many of the same formats
3. I screwed up when I wrote the code to support some of these weird formats
4. My mistakes are still present when YouTube transcodes certain fringe formats
Longer Answer (With Pictures !)
Let’s take a video format named RoQ, developed by noted game designer Graeme Devine. Originated for use in the FMV-heavy game The 11th Hour, the format eventually found its way into the Quake 3 engine as well as many games derived from the same technology.

Dr. Tim Ferguson reverse engineered the format (though it would later be open sourced along with the rest of the Q3 engine). I wrote a RoQ playback system for FFmpeg, and I messed up in doing so. I believe my coding error helps demonstrate the case I’m trying to make here.

Observe what happened when I pushed the jk02.roq sample through YouTube in my original experiment 4 years ago :

Do you see how the canyon walls bleed into the sky ? That’s not supposed to happen. FFmpeg doesn’t do that anymore but I was able to go back into the source code history to find when it did do that :

Academic Answer
FFmpeg fixed this bug in June of 2007 (thanks to Eric Lasota). The problem had to do with premature colorspace conversion in my original decoder.

Leftovers
I tried uploading the video again to see if the problem persists in YouTube’s transcoder. First bit of trivia : YouTube detects when you have uploaded the same video twice and rejects the subsequent attempts. So I created a double concatenation of the video and uploaded it. The problem is gone, illustrating that the backend is actually using a newer version of FFmpeg. This surprises me for somewhat esoteric reasons.

Here’s another interesting bit of trivia for those who don’t do a lot of YouTube uploading— YouTube reports format details when you upload a video :

So, yep, RoQ format. And you can wager that this will prompt me to go back through the litany of unusual formats that FFmpeg supports to see how YouTube responds.

ffmpeg : stream copy from .mxf into NLE-compatible format

9 juin 2013, par David

Because my NLE software does not support the .mxf-files from Canon XF100 I need to convert them into a supported format.

As far as I know, mxf-files are just another container format for mpeg2 streams, so it would be really nice to extract the streams and place them into another container (without reencoding).

I think ffmpeg can do this – correct me if I'm wrong – by running the following command :

ffmpeg -i in.mxf -vcodec copy out.m2ts (or .ts, .mts, ...)

ffmpeg finishes without errors after about 2 seconds (in.mxf is abut 170mb) :

c:\video>c:\ffmpeg\bin\ffmpeg -i in.MXF -vcodec copy out.m2ts

ffmpeg version N-53680-g0ab9362 Copyright (c) 2000-2013 the FFmpeg developers

  built on May 30 2013 12:14:03 with gcc 4.7.3 (GCC)

  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-av

isynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enab

le-iconv --enable-libass --enable-libbluray --enable-libcaca --enable-libfreetyp

e --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmp3lame --ena

ble-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-l

ibopus --enable-librtmp --enable-libschroedinger --enable-libsoxr --enable-libsp

eex --enable-libtheora --enable-libtwolame --enable-libvo-aacenc --enable-libvo-

amrwbenc --enable-libvorbis --enable-libvpx --enable-libx264 --enable-libxavs --

enable-libxvid --enable-zlib

  libavutil      52. 34.100 / 52. 34.100

  libavcodec     55. 12.102 / 55. 12.102

  libavformat    55.  8.100 / 55.  8.100

  libavdevice    55.  2.100 / 55.  2.100

  libavfilter     3. 73.100 /  3. 73.100

  libswscale      2.  3.100 /  2.  3.100

  libswresample   0. 17.102 /  0. 17.102

  libpostproc    52.  3.100 / 52.  3.100

Guessed Channel Layout for  Input Stream #0.1 : mono

Guessed Channel Layout for  Input Stream #0.2 : mono

Input #0, mxf, from &#39;in.MXF&#39;:

  Metadata:

    uid             : 1bb23c97-6205-4800-80a2-e00002244ba7

    generation_uid  : 1bb23c97-6205-4800-8122-e00002244ba7

    company_name    : CANON

    product_name    : XF100

    product_version : 1.00

    product_uid     : 060e2b34-0401-010d-0e15-005658460100

    modification_date: 2013-01-06 11:05:02

    timecode        : 01:42:14:22

  Duration: 00:00:28.32, start: 0.000000, bitrate: 51811 kb/s

    Stream #0:0: Video: mpeg2video (4:2:2), yuv422p, 1920x1080 [SAR 1:1 DAR 16:9

], 25 fps, 25 tbr, 25 tbn, 50 tbc

    Stream #0:1: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s

    Stream #0:2: Audio: pcm_s16le, 48000 Hz, mono, s16, 768 kb/s

Output #0, mpegts, to &#39;out.m2ts&#39;:

  Metadata:

    uid             : 1bb23c97-6205-4800-80a2-e00002244ba7

    generation_uid  : 1bb23c97-6205-4800-8122-e00002244ba7

    company_name    : CANON

    product_name    : XF100

    product_version : 1.00

    product_uid     : 060e2b34-0401-010d-0e15-005658460100

    modification_date: 2013-01-06 11:05:02

    timecode        : 01:42:14:22

    encoder         : Lavf55.8.100

    Stream #0:0: Video: mpeg2video, yuv422p, 1920x1080 [SAR 1:1 DAR 16:9], q=2-3

1, 25 fps, 90k tbn, 25 tbc

    Stream #0:1: Audio: mp2, 48000 Hz, mono, s16, 128 kb/s

Stream mapping:

  Stream #0:0 -> #0:0 (copy)

  Stream #0:1 -> #0:1 (pcm_s16le -> mp2)

Press [q] to stop, [?] for help

frame=  532 fps=0.0 q=-1.0 size=  143511kB time=00:00:21.25 bitrate=55314.1kbits

frame=  561 fps=435 q=-1.0 size=  151254kB time=00:00:22.42 bitrate=55242.0kbits

frame=  586 fps=314 q=-1.0 size=  158021kB time=00:00:23.41 bitrate=55288.0kbits

frame=  609 fps=255 q=-1.0 size=  164182kB time=00:00:24.34 bitrate=55235.4kbits

frame=  636 fps=217 q=-1.0 size=  171463kB time=00:00:25.42 bitrate=55235.1kbits

frame=  669 fps=194 q=-1.0 size=  180133kB time=00:00:26.72 bitrate=55226.3kbits

frame=  699 fps=173 q=-1.0 size=  188326kB time=00:00:27.92 bitrate=55256.6kbits

frame=  708 fps=169 q=-1.0 Lsize=  190877kB time=00:00:28.30 bitrate=55233.6kbit

s/s

video:172852kB audio:442kB subtitle:0 global headers:0kB muxing overhead 10.1461

18%

Unfortunately the output file turns out to be displayed correctly only by vlc player.
My NLE-software (Cyberlink Power Director) is able to open the file but most of the picture is green. Only a few pixels on the left edge show the original video :

output file

Any ideas how to solve that problem ? Is there a better way to use .mxf-files in NLE-software without native support ?

thanks in advance

1 | ... | 1782 | 1783 | 1784 | 1785 | 1786 | 1787 | 1788 | 1789 | 1790 | ... | 3086

Recherche avancée

Médias (91)

Spoon - Revenge !

My Morning Jacket - One Big Holiday

Zap Mama - Wadidyusay ?

David Byrne - My Fair Lady

Beastie Boys - Now Get Busy

Granite de l’Aber Ildut

Autres articles (74)

HTML5 audio and video support

Support audio et vidéo HTML5

De l’upload à la vidéo finale [version standalone]

Sur d’autres sites (9257)

Computer crashing when using python tools in same script

Google’s YouTube Uses FFmpeg

ffmpeg : stream copy from .mxf into NLE-compatible format

Se connecter

Navigation

Syndication

Boussole SPIP