
Recherche avancée
Médias (1)
-
Bug de détection d’ogg
22 mars 2013, par
Mis à jour : Avril 2013
Langue : français
Type : Video
Autres articles (30)
-
Les formats acceptés
28 janvier 2010, parLes commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
ffmpeg -codecs ffmpeg -formats
Les format videos acceptés en entrée
Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
Les formats vidéos de sortie possibles
Dans un premier temps on (...) -
Librairies et binaires spécifiques au traitement vidéo et sonore
31 janvier 2010, parLes logiciels et librairies suivantes sont utilisées par SPIPmotion d’une manière ou d’une autre.
Binaires obligatoires FFMpeg : encodeur principal, permet de transcoder presque tous les types de fichiers vidéo et sonores dans les formats lisibles sur Internet. CF ce tutoriel pour son installation ; Oggz-tools : outils d’inspection de fichiers ogg ; Mediainfo : récupération d’informations depuis la plupart des formats vidéos et sonores ;
Binaires complémentaires et facultatifs flvtool2 : (...) -
Support audio et vidéo HTML5
10 avril 2011MediaSPIP utilise les balises HTML5 video et audio pour la lecture de documents multimedia en profitant des dernières innovations du W3C supportées par les navigateurs modernes.
Pour les navigateurs plus anciens, le lecteur flash Flowplayer est utilisé.
Le lecteur HTML5 utilisé a été spécifiquement créé pour MediaSPIP : il est complètement modifiable graphiquement pour correspondre à un thème choisi.
Ces technologies permettent de distribuer vidéo et son à la fois sur des ordinateurs conventionnels (...)
Sur d’autres sites (3262)
-
how to make ffmpeg transcoding from mjpeg to h264 in real time ?
3 août 2020, par SolskGaerI have a mjpeg stream which output picture 10FPS(cellphone screenshot), and I use the following command to trancode it to a h264 stream and play it on my laptop


ffmpeg -f mjpeg -r 20 -y -i http://127.0.0.1:53293/ -vcodec libx264 -preset veryfast -profile:v baseline -b:v 1024k -r 10 -f h264 pipe:1 | ffplay -i pipe:0



but the output stream is a few seconds behind the cellphone screen. Here is the output of ffmpeg


ffplay version 4.3.1 Copyright (c) 2003-2020 the FFmpeg developers
 built with Apple clang version 11.0.3 (clang-1103.0.32.62)
 configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers
 built with Apple clang version 11.0.3 (clang-1103.0.32.62)
 configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
 libavutil 56. 51.100 / 56. 51.100
 libavcodec 58. 91.100 / 58. 91.100
 libavformat 58. 45.100 / 58. 45.100
 libavdevice 58. 10.100 / 58. 10.100
 libavfilter 7. 85.100 / 7. 85.100
 libavresample 4. 0. 0 / 4. 0. 0
 libswscale 5. 7.100 / 5. 7.100
 libswresample 3. 7.100 / 3. 7.100
 libpostproc 55. 7.100 / 55. 7.100
 libavutil 56. 51.100 / 56. 51.100
 libavcodec 58. 91.100 / 58. 91.100
 libavformat 58. 45.100 / 58. 45.100
 libavdevice 58. 10.100 / 58. 10.100
 libavfilter 7. 85.100 / 7. 85.100
 libavresample 4. 0. 0 / 4. 0. 0
 libswscale 5. 7.100 / 5. 7.100
 libswresample 3. 7.100 / 3. 7.100
 libpostproc 55. 7.100 / 55. 7.100
Input #0, mjpeg, from 'http://127.0.0.1:53293/':
 Duration: N/A, bitrate: N/A
 Stream #0:0: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 750x1334 [SAR 144:144 DAR 375:667], 20 tbr, 1200k tbn, 20 tbc
Stream mapping:
 Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 0x7f8f2900da00] using SAR=1/1
[libx264 @ 0x7f8f2900da00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x7f8f2900da00] profile Constrained Baseline, level 3.2, 4:2:0, 8-bit
Output #0, h264, to 'pipe:1':
 Metadata:
 encoder : Lavf58.45.100
 Stream #0:0: Video: h264 (libx264), yuvj420p(pc), 750x1334 [SAR 144:144 DAR 375:667], q=-1--1, 1024 kb/s, 10 fps, 10 tbn, 10 tbc
 Metadata:
 encoder : Lavc58.91.100 libx264
 Side data:
 cpb: bitrate max/min/avg: 0/0/1024000 buffer size: 0 vbv_delay: N/A
frame= 30 fps=5.2 q=29.0 size= 97kB time=00:00:00.10 bitrate=7911.0kbits/s dup=0 drop=26 speed=0.0172
frame= 33 fps=5.2 q=41.0 size= 190kB time=00:00:00.40 bitrate=3888.2kbits/s dup=0 drop=29 speed=0.0633
frame= 35 fps=5.1 q=40.0 size= 253kB time=00:00:00.60 bitrate=3457.8kbits/s dup=0 drop=31 speed=0.0874
frame= 38 fps=5.1 q=40.0 size= 330kB time=00:00:00.90 bitrate=3006.1kbits/s dup=0 drop=34 speed=0.122x
frame= 40 fps=5.1 q=40.0 size= 381kB time=00:00:01.10 bitrate=2838.3kbits/s dup=0 drop=36 speed=0.139x
frame= 43 fps=5.1 q=39.0 size= 460kB time=00:00:01.40 bitrate=2691.3kbits/s dup=0 drop=39 speed=0.166x
frame= 45 fps=5.0 q=39.0 size= 505kB time=00:00:01.60 bitrate=2585.9kbits/s dup=0 drop=41 speed=0.178x
frame= 48 fps=5.0 q=38.0 size= 552kB time=00:00:01.90 bitrate=2381.0kbits/s dup=0 drop=44 speed= 0.2x 
frame= 50 fps=5.0 q=33.0 size= 562kB time=00:00:02.10 bitrate=2190.7kbits/s dup=0 drop=46 speed=0.209x
frame= 53 fps=5.0 q=37.0 size= 572kB time=00:00:02.40 bitrate=1951.2kbits/s dup=0 drop=49 speed=0.227x
frame= 55 fps=5.0 q=37.0 size= 581kB time=00:00:02.60 bitrate=1830.0kbits/s dup=0 drop=51 speed=0.234x
frame= 58 fps=5.0 q=36.0 size= 591kB time=00:00:02.90 bitrate=1670.8kbits/s dup=0 drop=54 speed=0.25x 
frame= 60 fps=4.9 q=35.0 size= 598kB time=00:00:03.10 bitrate=1580.2kbits/s dup=0 drop=56 speed=0.255x
frame= 63 fps=5.0 q=35.0 size= 608kB time=00:00:03.40 bitrate=1465.2kbits/s dup=0 drop=59 speed=0.268x
frame= 65 fps=4.9 q=34.0 size= 613kB time=00:00:03.60 bitrate=1395.7kbits/s dup=0 drop=61 speed=0.273x
frame= 68 fps=5.0 q=34.0 size= 659kB time=00:00:03.90 bitrate=1384.5kbits/s dup=0 drop=64 speed=0.284x
frame= 70 fps=4.9 q=34.0 size= 682kB time=00:00:04.10 bitrate=1363.2kbits/s dup=0 drop=66 speed=0.288x
frame= 73 fps=4.9 q=35.0 size= 720kB time=00:00:04.40 bitrate=1340.1kbits/s dup=0 drop=69 speed=0.298x
frame= 75 fps=4.9 q=35.0 size= 746kB time=00:00:04.60 bitrate=1329.1kbits/s dup=0 drop=71 speed=0.301x
frame= 78 fps=4.9 q=36.0 size= 786kB time=00:00:04.90 bitrate=1314.6kbits/s dup=0 drop=74 speed=0.31x 
frame= 80 fps=4.9 q=36.0 size= 811kB time=00:00:05.10 bitrate=1303.3kbits/s dup=0 drop=76 speed=0.312x



I don't think the CPU of my laptop is the bottleneck of this process, here is the spec of my laptop



 Model Name: MacBook Pro
 Model Identifier: MacBookPro16,1
 Processor Name: 6-Core Intel Core i7
 Processor Speed: 2.6 GHz
 Number of Processors: 1
 Total Number of Cores: 6
 L2 Cache (per Core): 256 KB
 L3 Cache: 12 MB
 Hyper-Threading Technology: Enabled
 Memory: 16 GB



but I can't track the reason why this happened. What can I make the output stream be real time ? Any suggestion would be appreciated.


-
Real time playback of two blended videos with alpha channel and synched audio in pygame ?
22 décembre 2024, par Francesco CalderoneI need to play two videos with synched sound in real-time with Pygame.
Pygame does not currently support video streams, so I am using a ffmpeg subprocess.
The first video is a prores422_hq. This is a background video with no alpha channel.
The second video is a prores4444 overlay video with an alpha channel, and it needs to be played in real-tim on top of the first video (with transparency).
All of this needs synched sound from the first base video only.


I have tried many libraries, including pymovie pyav and opencv. The best result so far is to use a subprocess with ffmpeg.


ffmpeg -i testing/stefano_prores422_hq.mov -stream_loop -1 -i testing/key_prores4444.mov -filter_complex "[1:v]format=rgba,colorchannelmixer=aa=1.0[overlay];[0:v][overlay]overlay" -f nut pipe:1 | ffplay -


When running this in the terminal and playing with ffplay, everything is perfect, the overlay looks good, no dropped frames, and the sound is in synch.


However, trying to feed that to pygame via a subprocess creates either video delays and drop frames or audio not in synch.


EXAMPLE ONE :


# SOUND IS NOT SYNCHED - sound is played via ffplay
import pygame
import subprocess
import numpy as np
import sys

def main():
 pygame.init()
 screen_width, screen_height = 1920, 1080
 screen = pygame.display.set_mode((screen_width, screen_height))
 pygame.display.set_caption("PyGame + FFmpeg Overlay with Audio")
 clock = pygame.time.Clock()

 # LAUNCH AUDIO-ONLY SUBPROCESS
 audio_cmd = [
 "ffplay",
 "-nodisp", # no video window
 "-autoexit", # exit when video ends
 "-loglevel", "quiet",
 "testing/stefano_prores422_hq.mov"
 ]
 audio_process = subprocess.Popen(audio_cmd)

 # LAUNCH VIDEO-OVERLAY SUBPROCESS
 ffmpeg_command = [
 "ffmpeg",
 "-i", "testing/stefano_prores422_hq.mov",
 "-stream_loop", "-1", # loop alpha video
 "-i", "testing/key_prores4444.mov",
 "-filter_complex",
 "[1:v]format=rgba,colorchannelmixer=aa=1.0[overlay];" # ensure alpha channel
 "[0:v][overlay]overlay", # overlay second input onto first
 "-f", "rawvideo", # output raw video
 "-pix_fmt", "rgba", # RGBA format
 "pipe:1" # write to STDOUT
 ]
 video_process = subprocess.Popen(
 ffmpeg_command,
 stdout=subprocess.PIPE,
 stderr=subprocess.DEVNULL
 )
 frame_size = screen_width * screen_height * 4 # RGBA = 4 bytes/pixel
 running = True
 while running:
 for event in pygame.event.get():
 if event.type == pygame.QUIT:
 running = False
 break

 raw_frame = video_process.stdout.read(frame_size)

 if len(raw_frame) < frame_size:
 running = False
 break
 # Convert raw bytes -> NumPy array -> PyGame surface
 frame_array = np.frombuffer(raw_frame, dtype=np.uint8)
 frame_array = frame_array.reshape((screen_height, screen_width, 4))
 frame_surface = pygame.image.frombuffer(frame_array.tobytes(), 
 (screen_width, screen_height), 
 "RGBA")
 screen.blit(frame_surface, (0, 0))
 pygame.display.flip()
 clock.tick(25)
 video_process.terminate()
 video_process.wait()
 audio_process.terminate()
 audio_process.wait()
 pygame.quit()
 sys.exit()

if __name__ == "__main__":
 main()




EXAMPLE TWO


# NO VIDEO OVERLAY - SOUND SYNCHED
import ffmpeg
import pygame
import sys
import numpy as np
import tempfile
import os

def extract_audio(input_file, output_file):
 """Extract audio from video file to temporary WAV file"""
 (
 ffmpeg
 .input(input_file)
 .output(output_file, acodec='pcm_s16le', ac=2, ar='44100')
 .overwrite_output()
 .run(capture_stdout=True, capture_stderr=True)
 )

def get_video_fps(input_file):
 probe = ffmpeg.probe(input_file)
 video_info = next(s for s in probe['streams'] if s['codec_type'] == 'video')
 fps_str = video_info.get('r_frame_rate', '25/1')
 num, den = map(int, fps_str.split('/'))
 return num / den

input_file = "testing/stefano_prores422_hq.mov"

# Create temporary WAV file
temp_audio = tempfile.NamedTemporaryFile(suffix='.wav', delete=False)
temp_audio.close()
extract_audio(input_file, temp_audio.name)

probe = ffmpeg.probe(input_file)
video_info = next(s for s in probe['streams'] if s['codec_type'] == 'video')
width = int(video_info['width'])
height = int(video_info['height'])
fps = get_video_fps(input_file)

process = (
 ffmpeg
 .input(input_file)
 .output('pipe:', format='rawvideo', pix_fmt='rgb24')
 .run_async(pipe_stdout=True)
)

pygame.init()
pygame.mixer.init(frequency=44100, size=-16, channels=2, buffer=4096)
clock = pygame.time.Clock()
screen = pygame.display.set_mode((width, height))

pygame.mixer.music.load(temp_audio.name)
pygame.mixer.music.play()

frame_count = 0
start_time = pygame.time.get_ticks()

while True:
 for event in pygame.event.get():
 if event.type == pygame.QUIT:
 pygame.mixer.music.stop()
 os.unlink(temp_audio.name)
 sys.exit()

 in_bytes = process.stdout.read(width * height * 3)
 if not in_bytes:
 break

 # Calculate timing for synchronization
 expected_frame_time = frame_count * (1000 / fps)
 actual_time = pygame.time.get_ticks() - start_time
 
 if actual_time < expected_frame_time:
 pygame.time.wait(int(expected_frame_time - actual_time))
 
 in_frame = (
 np.frombuffer(in_bytes, dtype="uint8")
 .reshape([height, width, 3])
 )
 out_frame = pygame.surfarray.make_surface(np.transpose(in_frame, (1, 0, 2)))
 screen.blit(out_frame, (0, 0))
 pygame.display.flip()
 
 frame_count += 1

pygame.mixer.music.stop()
process.wait()
pygame.quit()
os.unlink(temp_audio.name)



I also tried using pygame mixer and a separate mp3 audio file, but that didn't work either. Any help on how to synch the sound while keeping the playback of both videos to 25 FPS would be greatly appreciated !!!


-
Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech
21 février, par dannym25I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.


Setup :


- 

- Twilio
sends inbound audio to my WebSocket server. - WebSocket receives and saves the raw mulaw-encoded audio data from Twilio.
- The audio is processed via Google Speech-to-Text for transcription.
- When I attempt to play back the audio, it sounds like machine-gun-like noise instead of spoken words.










1. Confirmed WebSocket Receives Data


• The WebSocket successfully logs incoming audio chunks from Twilio :


🔊 Received 379 bytes of audio from Twilio
🔊 Received 379 bytes of audio from Twilio



• This suggests Twilio is sending audio data, but it's not being interpreted correctly.


2. Saving and Playing Raw Audio


• I save the incoming raw mulaw (8000Hz) audio from Twilio to a file :


fs.appendFileSync('twilio-audio.raw', message);



• Then, I convert it to a
.wav
file using FFmpeg :

ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav



• Problem : When I play the audio using
ffplay
, it contains no speech, only rapid clicking sounds.

3. Ensured Correct Audio Encoding


• Twilio sends mulaw 8000Hz mono format.
• Verified that my
ffmpeg
conversion is using the same settings.
• Tried different conversion methods :

ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw -c:a pcm_s16le twilio-audio-fixed.wav



→ Same issue.


4. Checked Google Speech-to-Text Input Format


• Google STT requires proper encoding configuration :


const request = {
 config: {
 encoding: 'MULAW',
 sampleRateHertz: 8000,
 languageCode: 'en-US',
 },
 interimResults: false,
};



• No errors from Google STT, but it never detects speech, likely because the input audio is just noise.


5. Confirmed That Raw Audio is Not a WAV File


• Since Twilio sends raw audio, I checked whether I needed to strip the header before processing.
• Tried manually extracting raw bytes, but the issue persists.


Current Theory :


- 

- The WebSocket server might be handling Twilio’s raw audio incorrectly before saving it.
- There might be an additional header in the Twilio stream that needs to be removed before playback.
- Twilio’s
<stream></stream>
tag expects a WebSocket connection starting withwss://
instead ofhttps://
, and switching towss://
partially fixed some previous connection issues.








Code Snippets :


Twilio
Setup in TwiML Response 

app.post('/voice-response', (req, res) => {
 console.log("📞 Incoming call from Twilio");

 const twiml = new twilio.twiml.VoiceResponse();
 twiml.say("Hello! Welcome to the service. How can I help you?");
 
 // Prevent Twilio from hanging up too early
 twiml.pause({ length: 5 });

 twiml.connect().stream({
 url: `wss://your-ngrok-url/ws`,
 track: "inbound_track"
 });

 console.log("🛠️ Twilio Stream URL:", `wss://your-ngrok-url/ws`);
 
 res.type('text/xml').send(twiml.toString());
});



WebSocket Server Handling Twilio Audio Stream


wss.on('connection', (ws) => {
 console.log("🔗 WebSocket Connected! Waiting for audio input...");

 ws.on('message', (message) => {
 console.log(`🔊 Received ${message.length} bytes of audio from Twilio`);

 // Save raw audio data for debugging
 fs.appendFileSync('twilio-audio.raw', message);

 // Check if audio is non-empty but contains only noise
 if (message.length < 100) {
 console.warn("⚠️ Warning: Audio data from Twilio is very small. Might be silent.");
 }
 });

 ws.on('close', () => {
 console.log("❌ WebSocket Disconnected!");
 
 // Convert Twilio audio for debugging
 exec(`ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav`, (err) => {
 if (err) console.error("❌ FFmpeg Conversion Error:", err);
 else console.log("✅ Twilio Audio Saved as `twilio-audio.wav`");
 });
 });

 ws.on('error', (error) => console.error("⚠️ WebSocket Error:", error));
});



Questions :


- 

- Why is the audio from Twilio being received as a clicking noise instead of actual speech ?
- Do I need to strip any additional metadata from the raw bytes before saving ?
- Is there a known issue with Twilio’s
mulaw
format when streaming audio over WebSockets ? - How can I confirm that Google STT is receiving properly formatted audio ?










Additional Context :


- 

- Twilio
<stream></stream>
is connected and receiving data (confirmed by logs). - WebSocket successfully receives and saves audio, but it only plays noise.
- Tried multiple ffmpeg conversions, Google STT configurations, and raw data inspection.
- Still no recognizable speech in the audio output.










Any help is greatly appreciated ! 🙏


- Twilio