
Recherche avancée
Autres articles (94)
-
Multilang : améliorer l’interface pour les blocs multilingues
18 février 2011, parMultilang est un plugin supplémentaire qui n’est pas activé par défaut lors de l’initialisation de MediaSPIP.
Après son activation, une préconfiguration est mise en place automatiquement par MediaSPIP init permettant à la nouvelle fonctionnalité d’être automatiquement opérationnelle. Il n’est donc pas obligatoire de passer par une étape de configuration pour cela. -
Participer à sa traduction
10 avril 2011Vous pouvez nous aider à améliorer les locutions utilisées dans le logiciel ou à traduire celui-ci dans n’importe qu’elle nouvelle langue permettant sa diffusion à de nouvelles communautés linguistiques.
Pour ce faire, on utilise l’interface de traduction de SPIP où l’ensemble des modules de langue de MediaSPIP sont à disposition. ll vous suffit de vous inscrire sur la liste de discussion des traducteurs pour demander plus d’informations.
Actuellement MediaSPIP n’est disponible qu’en français et (...) -
Personnaliser les catégories
21 juin 2013, parFormulaire de création d’une catégorie
Pour ceux qui connaissent bien SPIP, une catégorie peut être assimilée à une rubrique.
Dans le cas d’un document de type catégorie, les champs proposés par défaut sont : Texte
On peut modifier ce formulaire dans la partie :
Administration > Configuration des masques de formulaire.
Dans le cas d’un document de type média, les champs non affichés par défaut sont : Descriptif rapide
Par ailleurs, c’est dans cette partie configuration qu’on peut indiquer le (...)
Sur d’autres sites (9773)
-
Google’s YouTube Uses FFmpeg
9 février 2011, par Multimedia Mike — GeneralControversy arose last week when Google accused Microsoft of stealing search engine results for their Bing search engine. It was a pretty novel sting operation and Google did a good job of visually illustrating their side of the story on their official blog.
This reminds me of the fact that Google’s YouTube video hosting site uses FFmpeg for converting videos. Not that this is in the same league as the search engine shenanigans (it’s perfectly legit to use FFmpeg in this capacity, but to my knowledge, Google/YouTube has never confirmed FFmpeg usage), but I thought I would revisit this item and illustrate it with screenshots. This is not new information— I first empirically tested this fact 4 years ago. However, a lot of people wonder how exactly I can identify FFmpeg on the backend when I claim that I’ve written code that helps power YouTube.
Short Answer
How do I know YouTube uses FFmpeg to convert multimedia ? Because :- FFmpeg can decode a number of impossibly obscure multimedia formats using code I wrote
- YouTube can transcode many of the same formats
- I screwed up when I wrote the code to support some of these weird formats
- My mistakes are still present when YouTube transcodes certain fringe formats
Longer Answer (With Pictures !)
Let’s take a video format named RoQ, developed by noted game designer Graeme Devine. Originated for use in the FMV-heavy game The 11th Hour, the format eventually found its way into the Quake 3 engine as well as many games derived from the same technology.Dr. Tim Ferguson reverse engineered the format (though it would later be open sourced along with the rest of the Q3 engine). I wrote a RoQ playback system for FFmpeg, and I messed up in doing so. I believe my coding error helps demonstrate the case I’m trying to make here.
Observe what happened when I pushed the jk02.roq sample through YouTube in my original experiment 4 years ago :
Do you see how the canyon walls bleed into the sky ? That’s not supposed to happen. FFmpeg doesn’t do that anymore but I was able to go back into the source code history to find when it did do that :
Academic Answer
FFmpeg fixed this bug in June of 2007 (thanks to Eric Lasota). The problem had to do with premature colorspace conversion in my original decoder.Leftovers
I tried uploading the video again to see if the problem persists in YouTube’s transcoder. First bit of trivia : YouTube detects when you have uploaded the same video twice and rejects the subsequent attempts. So I created a double concatenation of the video and uploaded it. The problem is gone, illustrating that the backend is actually using a newer version of FFmpeg. This surprises me for somewhat esoteric reasons.Here’s another interesting bit of trivia for those who don’t do a lot of YouTube uploading— YouTube reports format details when you upload a video :
So, yep, RoQ format. And you can wager that this will prompt me to go back through the litany of unusual formats that FFmpeg supports to see how YouTube responds.
-
Computer crashing when using python tools in same script
5 février 2023, par SL1997I am attempting to use the speech recognition toolkit VOSK and the speech diarization package Resemblyzer to transcibe audio and then identify the speakers in the audio.


Tools :


https://github.com/alphacep/vosk-api

https://github.com/resemble-ai/Resemblyzer

I can do both things individually but run into issues when trying to do them when running the one python script.


I used the following guide when setting up the diarization system :




Computer specs are as follows :


Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz, 3912 Mhz, 2 Core(s), 4 Logical Processor(s)

32GB RAM

The following is my code, I am not to sure if using threading is appropriate or if I even implemented it correctly, how can I best optimize this code as to achieve the results I am looking for and not crash.


from vosk import Model, KaldiRecognizer
from pydub import AudioSegment
import json
import sys
import os
import subprocess
import datetime
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from resemblyzer.hparams import sampling_rate
from spectralcluster import SpectralClusterer
import threading
import queue
import gc



def recognition(queue, audio, FRAME_RATE):

 model = Model("Vosk_Models/vosk-model-small-en-us-0.15")

 rec = KaldiRecognizer(model, FRAME_RATE)
 rec.SetWords(True)

 rec.AcceptWaveform(audio.raw_data)
 result = rec.Result()

 transcript = json.loads(result)#["text"]

 #return transcript
 queue.put(transcript)



def diarization(queue, audio):

 wav = preprocess_wav(audio)
 encoder = VoiceEncoder("cpu")
 _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
 print(cont_embeds.shape)

 clusterer = SpectralClusterer(
 min_clusters=2,
 max_clusters=100,
 p_percentile=0.90,
 gaussian_blur_sigma=1)

 labels = clusterer.predict(cont_embeds)

 def create_labelling(labels, wav_splits):

 times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
 labelling = []
 start_time = 0

 for i, time in enumerate(times):
 if i > 0 and labels[i] != labels[i - 1]:
 temp = [str(labels[i - 1]), start_time, time]
 labelling.append(tuple(temp))
 start_time = time
 if i == len(times) - 1:
 temp = [str(labels[i]), start_time, time]
 labelling.append(tuple(temp))

 return labelling

 #return
 labelling = create_labelling(labels, wav_splits)
 queue.put(labelling)



def identify_speaker(queue1, queue2):

 transcript = queue1.get()
 labelling = queue2.get()

 for speaker in labelling:

 speakerID = speaker[0]
 speakerStart = speaker[1]
 speakerEnd = speaker[2]

 result = transcript['result']
 words = [r['word'] for r in result if speakerStart < r['start'] < speakerEnd]
 #return
 print("Speaker",speakerID,":",' '.join(words), "\n")





def main():

 queue1 = queue.Queue()
 queue2 = queue.Queue()

 FRAME_RATE = 16000
 CHANNELS = 1

 podcast = AudioSegment.from_mp3("Podcast_Audio/Film-Release-Clip.mp3")
 podcast = podcast.set_channels(CHANNELS)
 podcast = podcast.set_frame_rate(FRAME_RATE)

 first_thread = threading.Thread(target=recognition, args=(queue1, podcast, FRAME_RATE))
 second_thread = threading.Thread(target=diarization, args=(queue2, podcast))
 third_thread = threading.Thread(target=identify_speaker, args=(queue1, queue2))

 first_thread.start()
 first_thread.join()
 gc.collect()

 second_thread.start()
 second_thread.join()
 gc.collect()

 third_thread.start()
 third_thread.join()
 gc.collect()

 # transcript = recognition(podcast,FRAME_RATE)
 #
 # labelling = diarization(podcast)
 #
 # print(identify_speaker(transcript, labelling))


if __name__ == '__main__':
 main()



When I say crash I mean everything freezes, I have to hold down the power button on the desktop and turn it back on again. No blue/blank screen, just frozen in my IDE looking at my code. Any help in resolving this issue would be greatly appreciated.


-
Need help configuring FFMPEG to work with a webcams h264 stream
9 août 2020, par The Welsh DragonI have been trying to get a H264 stream from a H264 usb webcam working but I am not making much progress so I'm hoping someone knows FFMPEG better than me !


There are dozens of questions/answers on SO but none solve my problem.


In short, I get a very pixelated (or sometimes mostly green) screen. I am using VLC to test the stream which is coming via an RTSP server. I am using FFMPEG to copy the webcam stream to the local RTSP server.


The webcam also supports YUYV which I can get working - it is just the h264 stream causing me problems.


So this is how the device is presented :


H264 USB Camera: USB Camera (usb-20980000.usb-1):
 /dev/video0
 /dev/video1
 /dev/video2
 /dev/video3



/dev/video0 is the YUYV and MPEG stream
/dev/video2 is the h264 stream that has the following capabilities :


ioctl: VIDIOC_ENUM_FMT
 Type: Video Capture

 [0]: 'H264' (H.264, compressed)
 Size: Discrete 1920x1080
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Size: Discrete 1280x720
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Size: Discrete 800x600
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Size: Discrete 640x480
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Size: Discrete 640x360
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Size: Discrete 352x288
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Size: Discrete 320x240
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Size: Discrete 1920x1080
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)
 Interval: Discrete 0.033s (30.000 fps)
 Interval: Discrete 0.040s (25.000 fps)
 Interval: Discrete 0.067s (15.000 fps)



I have tried various resolutions, the smaller giving slightly less pixelated images but none are usable and definitely dont compare to the YUYV high resolution results.


This (YUYV) command works :


ffmpeg -input_format yuyv422 -f video4linux2 -s 1280x720 -r 10 -i /dev/video0 -c:v h264_omx -r 10 -b:v 2M -an -f rtsp rtsp://localhost:80/live/stream



These two h264 options dont work :


ffmpeg -input_format h264 -f video4linux2 -video_size 1920x1080 -framerate 30 -i /dev/video0 -c:v copy -an -f rtsp rtsp://localhost:80/live/stream



ffmpeg -re -i /dev/video2 -video_size 800x600 -framerate 15 -pix_fmt yuv420p -tune zerolatency -c:v copy -an -f rtsp rtsp://localhost:80/live/stream



For that last command the FFMPEG output looks like this :


ffmpeg version git-2020-08-07-6fdf3cc Copyright (c) 2000-2020 the FFmpeg developers
 built with gcc 8 (Raspbian 8.3.0-6+rpi1)
 configuration: --extra-ldflags=-latomic --arch=armel --target-os=linux --enable-gpl --enable-omx --enable-omx-rpi --enable-nonfree --enable-libfreetype --enable-libx264 --enable-libmp3lame --enable-mmal --enable-indev=alsa --enable-outdev=alsa
 libavutil 56. 58.100 / 56. 58.100
 libavcodec 58.100.100 / 58.100.100
 libavformat 58. 50.100 / 58. 50.100
 libavdevice 58. 11.101 / 58. 11.101
 libavfilter 7. 87.100 / 7. 87.100
 libswscale 5. 8.100 / 5. 8.100
 libswresample 3. 8.100 / 3. 8.100
 libpostproc 55. 8.100 / 55. 8.100
Input #0, video4linux2,v4l2, from '/dev/video2':
 Duration: N/A, start: 1353.265049, bitrate: N/A
 Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, 30 fps, 30 tbr, 1000k tbn, 2000k tbc
[udp @ 0x38c29f0] attempted to set receive buffer to size 393216 but it only ended up set as 360448
[udp @ 0x38d7b50] attempted to set receive buffer to size 393216 but it only ended up set as 360448
Output #0, rtsp, to 'rtsp://localhost:80/live/stream':
 Metadata:
 encoder : Lavf58.50.100
 Stream #0:0: Video: h264 (Main), yuv420p(progressive), 1920x1080, q=2-31, 30 fps, 30 tbr, 90k tbn, 1000k tbc
Stream mapping:
 Stream #0:0 -> #0:0 (copy)
Press [q] to stop, [?] for help
[rtsp @ 0x38fd890] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[rtsp @ 0x38fd890] Non-monotonous DTS in output stream 0:0; previous: 0, current: 0; changing to 1. This may result in incorrect timestamps in the output file.
frame= 348 fps= 18 q=-1.0 size=N/A time=00:00:21.03 bitrate=N/A speed=1.09x



The issue looks like it is bandwidth related or the lack of processing power in the device being used BUT the YUYV works at a high resolution and (taking a completely different approach i.e. not using FFMPEG) I can get a very decent MPEG stream working on the same device.


So any FFMPEG experts out there who can help me with getting the correct parameters for a h264 stream ?