Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (1)

Mot : - Tags -/ogg

Autres articles (30)

Les formats acceptés

28 janvier 2010, par kent1

Les commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
ffmpeg -codecs ffmpeg -formats
Les format videos acceptés en entrée
Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
Les formats vidéos de sortie possibles
Dans un premier temps on (...)
Librairies et binaires spécifiques au traitement vidéo et sonore

31 janvier 2010, par kent1

Les logiciels et librairies suivantes sont utilisées par SPIPmotion d’une manière ou d’une autre.
Binaires obligatoires FFMpeg : encodeur principal, permet de transcoder presque tous les types de fichiers vidéo et sonores dans les formats lisibles sur Internet. CF ce tutoriel pour son installation ; Oggz-tools : outils d’inspection de fichiers ogg ; Mediainfo : récupération d’informations depuis la plupart des formats vidéos et sonores ;
Binaires complémentaires et facultatifs flvtool2 : (...)
Support audio et vidéo HTML5

10 avril 2011

MediaSPIP utilise les balises HTML5 video et audio pour la lecture de documents multimedia en profitant des dernières innovations du W3C supportées par les navigateurs modernes.
Pour les navigateurs plus anciens, le lecteur flash Flowplayer est utilisé.
Le lecteur HTML5 utilisé a été spécifiquement créé pour MediaSPIP : il est complètement modifiable graphiquement pour correspondre à un thème choisi.
Ces technologies permettent de distribuer vidéo et son à la fois sur des ordinateurs conventionnels (...)

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10

Sur d’autres sites (3262)

how to make ffmpeg transcoding from mjpeg to h264 in real time ?

3 août 2020, par SolskGaer

I have a mjpeg stream which output picture 10FPS(cellphone screenshot), and I use the following command to trancode it to a h264 stream and play it on my laptop

ffmpeg -f mjpeg -r 20 -y -i http://127.0.0.1:53293/  -vcodec libx264 -preset veryfast -profile:v baseline -b:v 1024k -r 10 -f h264 pipe:1 | ffplay -i pipe:0&#xA;

but the output stream is a few seconds behind the cellphone screen. Here is the output of ffmpeg

ffplay version 4.3.1 Copyright (c) 2003-2020 the FFmpeg developers&#xA;  built with Apple clang version 11.0.3 (clang-1103.0.32.62)&#xA;  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack&#xA;ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers&#xA;  built with Apple clang version 11.0.3 (clang-1103.0.32.62)&#xA;  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack&#xA;  libavutil      56. 51.100 / 56. 51.100&#xA;  libavcodec     58. 91.100 / 58. 91.100&#xA;  libavformat    58. 45.100 / 58. 45.100&#xA;  libavdevice    58. 10.100 / 58. 10.100&#xA;  libavfilter     7. 85.100 /  7. 85.100&#xA;  libavresample   4.  0.  0 /  4.  0.  0&#xA;  libswscale      5.  7.100 /  5.  7.100&#xA;  libswresample   3.  7.100 /  3.  7.100&#xA;  libpostproc    55.  7.100 / 55.  7.100&#xA;  libavutil      56. 51.100 / 56. 51.100&#xA;  libavcodec     58. 91.100 / 58. 91.100&#xA;  libavformat    58. 45.100 / 58. 45.100&#xA;  libavdevice    58. 10.100 / 58. 10.100&#xA;  libavfilter     7. 85.100 /  7. 85.100&#xA;  libavresample   4.  0.  0 /  4.  0.  0&#xA;  libswscale      5.  7.100 /  5.  7.100&#xA;  libswresample   3.  7.100 /  3.  7.100&#xA;  libpostproc    55.  7.100 / 55.  7.100&#xA;Input #0, mjpeg, from &#x27;http://127.0.0.1:53293/&#x27;:&#xA;  Duration: N/A, bitrate: N/A&#xA;    Stream #0:0: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 750x1334 [SAR 144:144 DAR 375:667], 20 tbr, 1200k tbn, 20 tbc&#xA;Stream mapping:&#xA;  Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))&#xA;Press [q] to stop, [?] for help&#xA;[libx264 @ 0x7f8f2900da00] using SAR=1/1&#xA;[libx264 @ 0x7f8f2900da00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2&#xA;[libx264 @ 0x7f8f2900da00] profile Constrained Baseline, level 3.2, 4:2:0, 8-bit&#xA;Output #0, h264, to &#x27;pipe:1&#x27;:&#xA;  Metadata:&#xA;    encoder         : Lavf58.45.100&#xA;    Stream #0:0: Video: h264 (libx264), yuvj420p(pc), 750x1334 [SAR 144:144 DAR 375:667], q=-1--1, 1024 kb/s, 10 fps, 10 tbn, 10 tbc&#xA;    Metadata:&#xA;      encoder         : Lavc58.91.100 libx264&#xA;    Side data:&#xA;      cpb: bitrate max/min/avg: 0/0/1024000 buffer size: 0 vbv_delay: N/A&#xA;frame=   30 fps=5.2 q=29.0 size=      97kB time=00:00:00.10 bitrate=7911.0kbits/s dup=0 drop=26 speed=0.0172&#xA;frame=   33 fps=5.2 q=41.0 size=     190kB time=00:00:00.40 bitrate=3888.2kbits/s dup=0 drop=29 speed=0.0633&#xA;frame=   35 fps=5.1 q=40.0 size=     253kB time=00:00:00.60 bitrate=3457.8kbits/s dup=0 drop=31 speed=0.0874&#xA;frame=   38 fps=5.1 q=40.0 size=     330kB time=00:00:00.90 bitrate=3006.1kbits/s dup=0 drop=34 speed=0.122x&#xA;frame=   40 fps=5.1 q=40.0 size=     381kB time=00:00:01.10 bitrate=2838.3kbits/s dup=0 drop=36 speed=0.139x&#xA;frame=   43 fps=5.1 q=39.0 size=     460kB time=00:00:01.40 bitrate=2691.3kbits/s dup=0 drop=39 speed=0.166x&#xA;frame=   45 fps=5.0 q=39.0 size=     505kB time=00:00:01.60 bitrate=2585.9kbits/s dup=0 drop=41 speed=0.178x&#xA;frame=   48 fps=5.0 q=38.0 size=     552kB time=00:00:01.90 bitrate=2381.0kbits/s dup=0 drop=44 speed= 0.2x &#xA;frame=   50 fps=5.0 q=33.0 size=     562kB time=00:00:02.10 bitrate=2190.7kbits/s dup=0 drop=46 speed=0.209x&#xA;frame=   53 fps=5.0 q=37.0 size=     572kB time=00:00:02.40 bitrate=1951.2kbits/s dup=0 drop=49 speed=0.227x&#xA;frame=   55 fps=5.0 q=37.0 size=     581kB time=00:00:02.60 bitrate=1830.0kbits/s dup=0 drop=51 speed=0.234x&#xA;frame=   58 fps=5.0 q=36.0 size=     591kB time=00:00:02.90 bitrate=1670.8kbits/s dup=0 drop=54 speed=0.25x &#xA;frame=   60 fps=4.9 q=35.0 size=     598kB time=00:00:03.10 bitrate=1580.2kbits/s dup=0 drop=56 speed=0.255x&#xA;frame=   63 fps=5.0 q=35.0 size=     608kB time=00:00:03.40 bitrate=1465.2kbits/s dup=0 drop=59 speed=0.268x&#xA;frame=   65 fps=4.9 q=34.0 size=     613kB time=00:00:03.60 bitrate=1395.7kbits/s dup=0 drop=61 speed=0.273x&#xA;frame=   68 fps=5.0 q=34.0 size=     659kB time=00:00:03.90 bitrate=1384.5kbits/s dup=0 drop=64 speed=0.284x&#xA;frame=   70 fps=4.9 q=34.0 size=     682kB time=00:00:04.10 bitrate=1363.2kbits/s dup=0 drop=66 speed=0.288x&#xA;frame=   73 fps=4.9 q=35.0 size=     720kB time=00:00:04.40 bitrate=1340.1kbits/s dup=0 drop=69 speed=0.298x&#xA;frame=   75 fps=4.9 q=35.0 size=     746kB time=00:00:04.60 bitrate=1329.1kbits/s dup=0 drop=71 speed=0.301x&#xA;frame=   78 fps=4.9 q=36.0 size=     786kB time=00:00:04.90 bitrate=1314.6kbits/s dup=0 drop=74 speed=0.31x &#xA;frame=   80 fps=4.9 q=36.0 size=     811kB time=00:00:05.10 bitrate=1303.3kbits/s dup=0 drop=76 speed=0.312x&#xA;

I don't think the CPU of my laptop is the bottleneck of this process, here is the spec of my laptop

&#xA;      Model Name: MacBook Pro&#xA;      Model Identifier: MacBookPro16,1&#xA;      Processor Name: 6-Core Intel Core i7&#xA;      Processor Speed: 2.6 GHz&#xA;      Number of Processors: 1&#xA;      Total Number of Cores: 6&#xA;      L2 Cache (per Core): 256 KB&#xA;      L3 Cache: 12 MB&#xA;      Hyper-Threading Technology: Enabled&#xA;      Memory: 16 GB&#xA;

but I can't track the reason why this happened. What can I make the output stream be real time ? Any suggestion would be appreciated.

Real time playback of two blended videos with alpha channel and synched audio in pygame ?

22 décembre 2024, par Francesco Calderone

I need to play two videos with synched sound in real-time with Pygame.
Pygame does not currently support video streams, so I am using a ffmpeg subprocess.
The first video is a prores422_hq. This is a background video with no alpha channel.
The second video is a prores4444 overlay video with an alpha channel, and it needs to be played in real-tim on top of the first video (with transparency).
All of this needs synched sound from the first base video only.

I have tried many libraries, including pymovie pyav and opencv. The best result so far is to use a subprocess with ffmpeg.

ffmpeg -i testing/stefano_prores422_hq.mov -stream_loop -1 -i testing/key_prores4444.mov -filter_complex "[1:v]format=rgba,colorchannelmixer=aa=1.0[overlay];[0:v][overlay]overlay" -f nut pipe:1 | ffplay -

When running this in the terminal and playing with ffplay, everything is perfect, the overlay looks good, no dropped frames, and the sound is in synch.

However, trying to feed that to pygame via a subprocess creates either video delays and drop frames or audio not in synch.

EXAMPLE ONE :

# SOUND IS NOT SYNCHED - sound is played via ffplay&#xA;import pygame&#xA;import subprocess&#xA;import numpy as np&#xA;import sys&#xA;&#xA;def main():&#xA;    pygame.init()&#xA;    screen_width, screen_height = 1920, 1080&#xA;    screen = pygame.display.set_mode((screen_width, screen_height))&#xA;    pygame.display.set_caption("PyGame &#x2B; FFmpeg Overlay with Audio")&#xA;    clock = pygame.time.Clock()&#xA;&#xA;    # LAUNCH AUDIO-ONLY SUBPROCESS&#xA;    audio_cmd = [&#xA;        "ffplay",&#xA;        "-nodisp",          # no video window&#xA;        "-autoexit",        # exit when video ends&#xA;        "-loglevel", "quiet",&#xA;        "testing/stefano_prores422_hq.mov"&#xA;    ]&#xA;    audio_process = subprocess.Popen(audio_cmd)&#xA;&#xA;    # LAUNCH VIDEO-OVERLAY SUBPROCESS&#xA;    ffmpeg_command = [&#xA;        "ffmpeg",&#xA;        "-i", "testing/stefano_prores422_hq.mov",&#xA;        "-stream_loop", "-1",         # loop alpha video&#xA;        "-i", "testing/key_prores4444.mov",&#xA;        "-filter_complex",&#xA;        "[1:v]format=rgba,colorchannelmixer=aa=1.0[overlay];"  # ensure alpha channel&#xA;        "[0:v][overlay]overlay",      # overlay second input onto first&#xA;        "-f", "rawvideo",             # output raw video&#xA;        "-pix_fmt", "rgba",           # RGBA format&#xA;        "pipe:1"                      # write to STDOUT&#xA;    ]&#xA;    video_process = subprocess.Popen(&#xA;        ffmpeg_command,&#xA;        stdout=subprocess.PIPE,&#xA;        stderr=subprocess.DEVNULL&#xA;    )&#xA;    frame_size = screen_width * screen_height * 4  # RGBA = 4 bytes/pixel&#xA;    running = True&#xA;    while running:&#xA;        for event in pygame.event.get():&#xA;            if event.type == pygame.QUIT:&#xA;                running = False&#xA;                break&#xA;&#xA;        raw_frame = video_process.stdout.read(frame_size)&#xA;&#xA;        if len(raw_frame) &lt; frame_size:&#xA;            running = False&#xA;            break&#xA;        # Convert raw bytes -> NumPy array -> PyGame surface&#xA;        frame_array = np.frombuffer(raw_frame, dtype=np.uint8)&#xA;        frame_array = frame_array.reshape((screen_height, screen_width, 4))&#xA;        frame_surface = pygame.image.frombuffer(frame_array.tobytes(), &#xA;                                                (screen_width, screen_height), &#xA;                                                "RGBA")&#xA;        screen.blit(frame_surface, (0, 0))&#xA;        pygame.display.flip()&#xA;        clock.tick(25)&#xA;    video_process.terminate()&#xA;    video_process.wait()&#xA;    audio_process.terminate()&#xA;    audio_process.wait()&#xA;    pygame.quit()&#xA;    sys.exit()&#xA;&#xA;if __name__ == "__main__":&#xA;    main()&#xA;&#xA;

EXAMPLE TWO

# NO VIDEO OVERLAY - SOUND SYNCHED&#xA;import ffmpeg&#xA;import pygame&#xA;import sys&#xA;import numpy as np&#xA;import tempfile&#xA;import os&#xA;&#xA;def extract_audio(input_file, output_file):&#xA;    """Extract audio from video file to temporary WAV file"""&#xA;    (&#xA;        ffmpeg&#xA;        .input(input_file)&#xA;        .output(output_file, acodec=&#x27;pcm_s16le&#x27;, ac=2, ar=&#x27;44100&#x27;)&#xA;        .overwrite_output()&#xA;        .run(capture_stdout=True, capture_stderr=True)&#xA;    )&#xA;&#xA;def get_video_fps(input_file):&#xA;    probe = ffmpeg.probe(input_file)&#xA;    video_info = next(s for s in probe[&#x27;streams&#x27;] if s[&#x27;codec_type&#x27;] == &#x27;video&#x27;)&#xA;    fps_str = video_info.get(&#x27;r_frame_rate&#x27;, &#x27;25/1&#x27;)&#xA;    num, den = map(int, fps_str.split(&#x27;/&#x27;))&#xA;    return num / den&#xA;&#xA;input_file = "testing/stefano_prores422_hq.mov"&#xA;&#xA;# Create temporary WAV file&#xA;temp_audio = tempfile.NamedTemporaryFile(suffix=&#x27;.wav&#x27;, delete=False)&#xA;temp_audio.close()&#xA;extract_audio(input_file, temp_audio.name)&#xA;&#xA;probe = ffmpeg.probe(input_file)&#xA;video_info = next(s for s in probe[&#x27;streams&#x27;] if s[&#x27;codec_type&#x27;] == &#x27;video&#x27;)&#xA;width = int(video_info[&#x27;width&#x27;])&#xA;height = int(video_info[&#x27;height&#x27;])&#xA;fps = get_video_fps(input_file)&#xA;&#xA;process = (&#xA;    ffmpeg&#xA;    .input(input_file)&#xA;    .output(&#x27;pipe:&#x27;, format=&#x27;rawvideo&#x27;, pix_fmt=&#x27;rgb24&#x27;)&#xA;    .run_async(pipe_stdout=True)&#xA;)&#xA;&#xA;pygame.init()&#xA;pygame.mixer.init(frequency=44100, size=-16, channels=2, buffer=4096)&#xA;clock = pygame.time.Clock()&#xA;screen = pygame.display.set_mode((width, height))&#xA;&#xA;pygame.mixer.music.load(temp_audio.name)&#xA;pygame.mixer.music.play()&#xA;&#xA;frame_count = 0&#xA;start_time = pygame.time.get_ticks()&#xA;&#xA;while True:&#xA;    for event in pygame.event.get():&#xA;        if event.type == pygame.QUIT:&#xA;            pygame.mixer.music.stop()&#xA;            os.unlink(temp_audio.name)&#xA;            sys.exit()&#xA;&#xA;    in_bytes = process.stdout.read(width * height * 3)&#xA;    if not in_bytes:&#xA;        break&#xA;&#xA;    # Calculate timing for synchronization&#xA;    expected_frame_time = frame_count * (1000 / fps)&#xA;    actual_time = pygame.time.get_ticks() - start_time&#xA;    &#xA;    if actual_time &lt; expected_frame_time:&#xA;        pygame.time.wait(int(expected_frame_time - actual_time))&#xA;    &#xA;    in_frame = (&#xA;        np.frombuffer(in_bytes, dtype="uint8")&#xA;        .reshape([height, width, 3])&#xA;    )&#xA;    out_frame = pygame.surfarray.make_surface(np.transpose(in_frame, (1, 0, 2)))&#xA;    screen.blit(out_frame, (0, 0))&#xA;    pygame.display.flip()&#xA;    &#xA;    frame_count &#x2B;= 1&#xA;&#xA;pygame.mixer.music.stop()&#xA;process.wait()&#xA;pygame.quit()&#xA;os.unlink(temp_audio.name)&#xA;

I also tried using pygame mixer and a separate mp3 audio file, but that didn't work either. Any help on how to synch the sound while keeping the playback of both videos to 25 FPS would be greatly appreciated !!!

Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

21 février, par dannym25
I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.




Setup :



- Twilio sends inbound audio to my WebSocket server.
- WebSocket receives and saves the raw mulaw-encoded audio data from Twilio.
- The audio is processed via Google Speech-to-Text for transcription.
- When I attempt to play back the audio, it sounds like machine-gun-like noise instead of spoken words.



1. Confirmed WebSocket Receives Data




• The WebSocket successfully logs incoming audio chunks from Twilio :



```
&#128266; Received 379 bytes of audio from Twilio&#xA;&#128266; Received 379 bytes of audio from Twilio&#xA;
```



• This suggests Twilio is sending audio data, but it's not being interpreted correctly.




2. Saving and Playing Raw Audio




• I save the incoming raw mulaw (8000Hz) audio from Twilio to a file :



```
fs.appendFileSync(&#x27;twilio-audio.raw&#x27;, message);&#xA;
```



• Then, I convert it to a .wav file using FFmpeg :



```
ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav&#xA;
```



• Problem : When I play the audio using ffplay, it contains no speech, only rapid clicking sounds.




3. Ensured Correct Audio Encoding




• Twilio sends mulaw 8000Hz mono format.
• Verified that my ffmpeg conversion is using the same settings.
• Tried different conversion methods :



```
ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw -c:a pcm_s16le twilio-audio-fixed.wav&#xA;
```



→ Same issue.




4. Checked Google Speech-to-Text Input Format




• Google STT requires proper encoding configuration :



```
const request = {&#xA;    config: {&#xA;        encoding: &#x27;MULAW&#x27;,&#xA;        sampleRateHertz: 8000,&#xA;        languageCode: &#x27;en-US&#x27;,&#xA;    },&#xA;    interimResults: false,&#xA;};&#xA;
```



• No errors from Google STT, but it never detects speech, likely because the input audio is just noise.




5. Confirmed That Raw Audio is Not a WAV File




• Since Twilio sends raw audio, I checked whether I needed to strip the header before processing.
• Tried manually extracting raw bytes, but the issue persists.




Current Theory :



- The WebSocket server might be handling Twilio’s raw audio incorrectly before saving it.
- There might be an additional header in the Twilio stream that needs to be removed before playback.
- Twilio’s <stream></stream> tag expects a WebSocket connection starting with wss:// instead of https://, and switching to wss:// partially fixed some previous connection issues.



Code Snippets :




Twilio Setup in TwiML Response



```
app.post(&#x27;/voice-response&#x27;, (req, res) => {&#xA;    console.log("&#128222; Incoming call from Twilio");&#xA;&#xA;    const twiml = new twilio.twiml.VoiceResponse();&#xA;    twiml.say("Hello! Welcome to the service. How can I help you?");&#xA;    &#xA;    // Prevent Twilio from hanging up too early&#xA;    twiml.pause({ length: 5 });&#xA;&#xA;    twiml.connect().stream({&#xA;        url: `wss://your-ngrok-url/ws`,&#xA;        track: "inbound_track"&#xA;    });&#xA;&#xA;    console.log("&#128736;️ Twilio Stream URL:", `wss://your-ngrok-url/ws`);&#xA;    &#xA;    res.type(&#x27;text/xml&#x27;).send(twiml.toString());&#xA;});&#xA;
```



WebSocket Server Handling Twilio Audio Stream



```
wss.on(&#x27;connection&#x27;, (ws) => {&#xA;    console.log("&#128279; WebSocket Connected! Waiting for audio input...");&#xA;&#xA;    ws.on(&#x27;message&#x27;, (message) => {&#xA;        console.log(`&#128266; Received ${message.length} bytes of audio from Twilio`);&#xA;&#xA;        // Save raw audio data for debugging&#xA;        fs.appendFileSync(&#x27;twilio-audio.raw&#x27;, message);&#xA;&#xA;        // Check if audio is non-empty but contains only noise&#xA;        if (message.length &lt; 100) {&#xA;            console.warn("⚠️ Warning: Audio data from Twilio is very small. Might be silent.");&#xA;        }&#xA;    });&#xA;&#xA;    ws.on(&#x27;close&#x27;, () => {&#xA;        console.log("❌ WebSocket Disconnected!");&#xA;        &#xA;        // Convert Twilio audio for debugging&#xA;        exec(`ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav`, (err) => {&#xA;            if (err) console.error("❌ FFmpeg Conversion Error:", err);&#xA;            else console.log("✅ Twilio Audio Saved as `twilio-audio.wav`");&#xA;        });&#xA;    });&#xA;&#xA;    ws.on(&#x27;error&#x27;, (error) => console.error("⚠️ WebSocket Error:", error));&#xA;});&#xA;
```



Questions :



- Why is the audio from Twilio being received as a clicking noise instead of actual speech ?
- Do I need to strip any additional metadata from the raw bytes before saving ?
- Is there a known issue with Twilio’s mulaw format when streaming audio over WebSockets ?
- How can I confirm that Google STT is receiving properly formatted audio ?



Additional Context :



- Twilio <stream></stream> is connected and receiving data (confirmed by logs).
- WebSocket successfully receives and saves audio, but it only plays noise.
- Tried multiple ffmpeg conversions, Google STT configurations, and raw data inspection.
- Still no recognizable speech in the audio output.



Any help is greatly appreciated ! 🙏

1 | ... | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | ... | 1088

Recherche avancée

Médias (1)

Bug de détection d’ogg

Autres articles (30)

Les formats acceptés

Librairies et binaires spécifiques au traitement vidéo et sonore

Support audio et vidéo HTML5

Sur d’autres sites (3262)

how to make ffmpeg transcoding from mjpeg to h264 in real time ?

Real time playback of two blended videos with alpha channel and synched audio in pygame ?

Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

Se connecter

Navigation

Syndication

Boussole SPIP