Recherche avancée

Médias (1)

Mot : - Tags -/ogg

Autres articles (30)

  • Les formats acceptés

    28 janvier 2010, par

    Les commandes suivantes permettent d’avoir des informations sur les formats et codecs gérés par l’installation local de ffmpeg :
    ffmpeg -codecs ffmpeg -formats
    Les format videos acceptés en entrée
    Cette liste est non exhaustive, elle met en exergue les principaux formats utilisés : h264 : H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10 m4v : raw MPEG-4 video format flv : Flash Video (FLV) / Sorenson Spark / Sorenson H.263 Theora wmv :
    Les formats vidéos de sortie possibles
    Dans un premier temps on (...)

  • Librairies et binaires spécifiques au traitement vidéo et sonore

    31 janvier 2010, par

    Les logiciels et librairies suivantes sont utilisées par SPIPmotion d’une manière ou d’une autre.
    Binaires obligatoires FFMpeg : encodeur principal, permet de transcoder presque tous les types de fichiers vidéo et sonores dans les formats lisibles sur Internet. CF ce tutoriel pour son installation ; Oggz-tools : outils d’inspection de fichiers ogg ; Mediainfo : récupération d’informations depuis la plupart des formats vidéos et sonores ;
    Binaires complémentaires et facultatifs flvtool2 : (...)

  • Support audio et vidéo HTML5

    10 avril 2011

    MediaSPIP utilise les balises HTML5 video et audio pour la lecture de documents multimedia en profitant des dernières innovations du W3C supportées par les navigateurs modernes.
    Pour les navigateurs plus anciens, le lecteur flash Flowplayer est utilisé.
    Le lecteur HTML5 utilisé a été spécifiquement créé pour MediaSPIP : il est complètement modifiable graphiquement pour correspondre à un thème choisi.
    Ces technologies permettent de distribuer vidéo et son à la fois sur des ordinateurs conventionnels (...)

Sur d’autres sites (3262)

  • how to make ffmpeg transcoding from mjpeg to h264 in real time ?

    3 août 2020, par SolskGaer

    I have a mjpeg stream which output picture 10FPS(cellphone screenshot), and I use the following command to trancode it to a h264 stream and play it on my laptop

    


    ffmpeg -f mjpeg -r 20 -y -i http://127.0.0.1:53293/  -vcodec libx264 -preset veryfast -profile:v baseline -b:v 1024k -r 10 -f h264 pipe:1 | ffplay -i pipe:0


    


    but the output stream is a few seconds behind the cellphone screen. Here is the output of ffmpeg

    


    ffplay version 4.3.1 Copyright (c) 2003-2020 the FFmpeg developers
  built with Apple clang version 11.0.3 (clang-1103.0.32.62)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
ffmpeg version 4.3.1 Copyright (c) 2000-2020 the FFmpeg developers
  built with Apple clang version 11.0.3 (clang-1103.0.32.62)
  configuration: --prefix=/usr/local/Cellar/ffmpeg/4.3.1 --enable-shared --enable-pthreads --enable-version3 --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librtmp --enable-libspeex --enable-libsoxr --enable-videotoolbox --disable-libjack --disable-indev=jack
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Input #0, mjpeg, from 'http://127.0.0.1:53293/':
  Duration: N/A, bitrate: N/A
    Stream #0:0: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 750x1334 [SAR 144:144 DAR 375:667], 20 tbr, 1200k tbn, 20 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 0x7f8f2900da00] using SAR=1/1
[libx264 @ 0x7f8f2900da00] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 0x7f8f2900da00] profile Constrained Baseline, level 3.2, 4:2:0, 8-bit
Output #0, h264, to 'pipe:1':
  Metadata:
    encoder         : Lavf58.45.100
    Stream #0:0: Video: h264 (libx264), yuvj420p(pc), 750x1334 [SAR 144:144 DAR 375:667], q=-1--1, 1024 kb/s, 10 fps, 10 tbn, 10 tbc
    Metadata:
      encoder         : Lavc58.91.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/1024000 buffer size: 0 vbv_delay: N/A
frame=   30 fps=5.2 q=29.0 size=      97kB time=00:00:00.10 bitrate=7911.0kbits/s dup=0 drop=26 speed=0.0172
frame=   33 fps=5.2 q=41.0 size=     190kB time=00:00:00.40 bitrate=3888.2kbits/s dup=0 drop=29 speed=0.0633
frame=   35 fps=5.1 q=40.0 size=     253kB time=00:00:00.60 bitrate=3457.8kbits/s dup=0 drop=31 speed=0.0874
frame=   38 fps=5.1 q=40.0 size=     330kB time=00:00:00.90 bitrate=3006.1kbits/s dup=0 drop=34 speed=0.122x
frame=   40 fps=5.1 q=40.0 size=     381kB time=00:00:01.10 bitrate=2838.3kbits/s dup=0 drop=36 speed=0.139x
frame=   43 fps=5.1 q=39.0 size=     460kB time=00:00:01.40 bitrate=2691.3kbits/s dup=0 drop=39 speed=0.166x
frame=   45 fps=5.0 q=39.0 size=     505kB time=00:00:01.60 bitrate=2585.9kbits/s dup=0 drop=41 speed=0.178x
frame=   48 fps=5.0 q=38.0 size=     552kB time=00:00:01.90 bitrate=2381.0kbits/s dup=0 drop=44 speed= 0.2x 
frame=   50 fps=5.0 q=33.0 size=     562kB time=00:00:02.10 bitrate=2190.7kbits/s dup=0 drop=46 speed=0.209x
frame=   53 fps=5.0 q=37.0 size=     572kB time=00:00:02.40 bitrate=1951.2kbits/s dup=0 drop=49 speed=0.227x
frame=   55 fps=5.0 q=37.0 size=     581kB time=00:00:02.60 bitrate=1830.0kbits/s dup=0 drop=51 speed=0.234x
frame=   58 fps=5.0 q=36.0 size=     591kB time=00:00:02.90 bitrate=1670.8kbits/s dup=0 drop=54 speed=0.25x 
frame=   60 fps=4.9 q=35.0 size=     598kB time=00:00:03.10 bitrate=1580.2kbits/s dup=0 drop=56 speed=0.255x
frame=   63 fps=5.0 q=35.0 size=     608kB time=00:00:03.40 bitrate=1465.2kbits/s dup=0 drop=59 speed=0.268x
frame=   65 fps=4.9 q=34.0 size=     613kB time=00:00:03.60 bitrate=1395.7kbits/s dup=0 drop=61 speed=0.273x
frame=   68 fps=5.0 q=34.0 size=     659kB time=00:00:03.90 bitrate=1384.5kbits/s dup=0 drop=64 speed=0.284x
frame=   70 fps=4.9 q=34.0 size=     682kB time=00:00:04.10 bitrate=1363.2kbits/s dup=0 drop=66 speed=0.288x
frame=   73 fps=4.9 q=35.0 size=     720kB time=00:00:04.40 bitrate=1340.1kbits/s dup=0 drop=69 speed=0.298x
frame=   75 fps=4.9 q=35.0 size=     746kB time=00:00:04.60 bitrate=1329.1kbits/s dup=0 drop=71 speed=0.301x
frame=   78 fps=4.9 q=36.0 size=     786kB time=00:00:04.90 bitrate=1314.6kbits/s dup=0 drop=74 speed=0.31x 
frame=   80 fps=4.9 q=36.0 size=     811kB time=00:00:05.10 bitrate=1303.3kbits/s dup=0 drop=76 speed=0.312x


    


    I don't think the CPU of my laptop is the bottleneck of this process, here is the spec of my laptop

    


    
      Model Name: MacBook Pro
      Model Identifier: MacBookPro16,1
      Processor Name: 6-Core Intel Core i7
      Processor Speed: 2.6 GHz
      Number of Processors: 1
      Total Number of Cores: 6
      L2 Cache (per Core): 256 KB
      L3 Cache: 12 MB
      Hyper-Threading Technology: Enabled
      Memory: 16 GB


    


    but I can't track the reason why this happened. What can I make the output stream be real time ? Any suggestion would be appreciated.

    


  • Real time playback of two blended videos with alpha channel and synched audio in pygame ?

    22 décembre 2024, par Francesco Calderone

    I need to play two videos with synched sound in real-time with Pygame.
Pygame does not currently support video streams, so I am using a ffmpeg subprocess.
The first video is a prores422_hq. This is a background video with no alpha channel.
The second video is a prores4444 overlay video with an alpha channel, and it needs to be played in real-tim on top of the first video (with transparency).
All of this needs synched sound from the first base video only.

    


    I have tried many libraries, including pymovie pyav and opencv. The best result so far is to use a subprocess with ffmpeg.

    


    ffmpeg -i testing/stefano_prores422_hq.mov -stream_loop -1 -i testing/key_prores4444.mov -filter_complex "[1:v]format=rgba,colorchannelmixer=aa=1.0[overlay];[0:v][overlay]overlay" -f nut pipe:1 | ffplay -

    


    When running this in the terminal and playing with ffplay, everything is perfect, the overlay looks good, no dropped frames, and the sound is in synch.

    


    However, trying to feed that to pygame via a subprocess creates either video delays and drop frames or audio not in synch.

    


    EXAMPLE ONE :

    


    # SOUND IS NOT SYNCHED - sound is played via ffplay
import pygame
import subprocess
import numpy as np
import sys

def main():
    pygame.init()
    screen_width, screen_height = 1920, 1080
    screen = pygame.display.set_mode((screen_width, screen_height))
    pygame.display.set_caption("PyGame + FFmpeg Overlay with Audio")
    clock = pygame.time.Clock()

    # LAUNCH AUDIO-ONLY SUBPROCESS
    audio_cmd = [
        "ffplay",
        "-nodisp",          # no video window
        "-autoexit",        # exit when video ends
        "-loglevel", "quiet",
        "testing/stefano_prores422_hq.mov"
    ]
    audio_process = subprocess.Popen(audio_cmd)

    # LAUNCH VIDEO-OVERLAY SUBPROCESS
    ffmpeg_command = [
        "ffmpeg",
        "-i", "testing/stefano_prores422_hq.mov",
        "-stream_loop", "-1",         # loop alpha video
        "-i", "testing/key_prores4444.mov",
        "-filter_complex",
        "[1:v]format=rgba,colorchannelmixer=aa=1.0[overlay];"  # ensure alpha channel
        "[0:v][overlay]overlay",      # overlay second input onto first
        "-f", "rawvideo",             # output raw video
        "-pix_fmt", "rgba",           # RGBA format
        "pipe:1"                      # write to STDOUT
    ]
    video_process = subprocess.Popen(
        ffmpeg_command,
        stdout=subprocess.PIPE,
        stderr=subprocess.DEVNULL
    )
    frame_size = screen_width * screen_height * 4  # RGBA = 4 bytes/pixel
    running = True
    while running:
        for event in pygame.event.get():
            if event.type == pygame.QUIT:
                running = False
                break

        raw_frame = video_process.stdout.read(frame_size)

        if len(raw_frame) < frame_size:
            running = False
            break
        # Convert raw bytes -> NumPy array -> PyGame surface
        frame_array = np.frombuffer(raw_frame, dtype=np.uint8)
        frame_array = frame_array.reshape((screen_height, screen_width, 4))
        frame_surface = pygame.image.frombuffer(frame_array.tobytes(), 
                                                (screen_width, screen_height), 
                                                "RGBA")
        screen.blit(frame_surface, (0, 0))
        pygame.display.flip()
        clock.tick(25)
    video_process.terminate()
    video_process.wait()
    audio_process.terminate()
    audio_process.wait()
    pygame.quit()
    sys.exit()

if __name__ == "__main__":
    main()



    


    EXAMPLE TWO

    


    # NO VIDEO OVERLAY - SOUND SYNCHED
import ffmpeg
import pygame
import sys
import numpy as np
import tempfile
import os

def extract_audio(input_file, output_file):
    """Extract audio from video file to temporary WAV file"""
    (
        ffmpeg
        .input(input_file)
        .output(output_file, acodec='pcm_s16le', ac=2, ar='44100')
        .overwrite_output()
        .run(capture_stdout=True, capture_stderr=True)
    )

def get_video_fps(input_file):
    probe = ffmpeg.probe(input_file)
    video_info = next(s for s in probe['streams'] if s['codec_type'] == 'video')
    fps_str = video_info.get('r_frame_rate', '25/1')
    num, den = map(int, fps_str.split('/'))
    return num / den

input_file = "testing/stefano_prores422_hq.mov"

# Create temporary WAV file
temp_audio = tempfile.NamedTemporaryFile(suffix='.wav', delete=False)
temp_audio.close()
extract_audio(input_file, temp_audio.name)

probe = ffmpeg.probe(input_file)
video_info = next(s for s in probe['streams'] if s['codec_type'] == 'video')
width = int(video_info['width'])
height = int(video_info['height'])
fps = get_video_fps(input_file)

process = (
    ffmpeg
    .input(input_file)
    .output('pipe:', format='rawvideo', pix_fmt='rgb24')
    .run_async(pipe_stdout=True)
)

pygame.init()
pygame.mixer.init(frequency=44100, size=-16, channels=2, buffer=4096)
clock = pygame.time.Clock()
screen = pygame.display.set_mode((width, height))

pygame.mixer.music.load(temp_audio.name)
pygame.mixer.music.play()

frame_count = 0
start_time = pygame.time.get_ticks()

while True:
    for event in pygame.event.get():
        if event.type == pygame.QUIT:
            pygame.mixer.music.stop()
            os.unlink(temp_audio.name)
            sys.exit()

    in_bytes = process.stdout.read(width * height * 3)
    if not in_bytes:
        break

    # Calculate timing for synchronization
    expected_frame_time = frame_count * (1000 / fps)
    actual_time = pygame.time.get_ticks() - start_time
    
    if actual_time < expected_frame_time:
        pygame.time.wait(int(expected_frame_time - actual_time))
    
    in_frame = (
        np.frombuffer(in_bytes, dtype="uint8")
        .reshape([height, width, 3])
    )
    out_frame = pygame.surfarray.make_surface(np.transpose(in_frame, (1, 0, 2)))
    screen.blit(out_frame, (0, 0))
    pygame.display.flip()
    
    frame_count += 1

pygame.mixer.music.stop()
process.wait()
pygame.quit()
os.unlink(temp_audio.name)


    


    I also tried using pygame mixer and a separate mp3 audio file, but that didn't work either. Any help on how to synch the sound while keeping the playback of both videos to 25 FPS would be greatly appreciated !!!

    


  • Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

    21 février, par dannym25

    I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.

    


    Setup :

    


      

    • Twilio sends inbound audio to my WebSocket server.
    • 


    • WebSocket receives and saves the raw mulaw-encoded audio data from Twilio.
    • 


    • The audio is processed via Google Speech-to-Text for transcription.
    • 


    • When I attempt to play back the audio, it sounds like machine-gun-like noise instead of spoken words.
    • 


    


    1. Confirmed WebSocket Receives Data

    


    • The WebSocket successfully logs incoming audio chunks from Twilio :

    


    🔊 Received 379 bytes of audio from Twilio
🔊 Received 379 bytes of audio from Twilio


    


    • This suggests Twilio is sending audio data, but it's not being interpreted correctly.

    


    2. Saving and Playing Raw Audio

    


    • I save the incoming raw mulaw (8000Hz) audio from Twilio to a file :

    


    fs.appendFileSync('twilio-audio.raw', message);


    


    • Then, I convert it to a .wav file using FFmpeg :

    


    ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav


    


    Problem : When I play the audio using ffplay, it contains no speech, only rapid clicking sounds.

    


    3. Ensured Correct Audio Encoding

    


    • Twilio sends mulaw 8000Hz mono format.
• Verified that my ffmpeg conversion is using the same settings.
• Tried different conversion methods :

    


    ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw -c:a pcm_s16le twilio-audio-fixed.wav


    


    → Same issue.

    


    4. Checked Google Speech-to-Text Input Format

    


    • Google STT requires proper encoding configuration :

    


    const request = {
    config: {
        encoding: 'MULAW',
        sampleRateHertz: 8000,
        languageCode: 'en-US',
    },
    interimResults: false,
};


    


    • No errors from Google STT, but it never detects speech, likely because the input audio is just noise.

    


    5. Confirmed That Raw Audio is Not a WAV File

    


    • Since Twilio sends raw audio, I checked whether I needed to strip the header before processing.
• Tried manually extracting raw bytes, but the issue persists.

    


    Current Theory :

    


      

    • The WebSocket server might be handling Twilio’s raw audio incorrectly before saving it.
    • 


    • There might be an additional header in the Twilio stream that needs to be removed before playback.
    • 


    • Twilio’s <stream></stream> tag expects a WebSocket connection starting with wss:// instead of https://, and switching to wss:// partially fixed some previous connection issues.
    • &#xA;

    &#xA;

    Code Snippets :

    &#xA;

    Twilio Setup in TwiML Response

    &#xA;

    app.post(&#x27;/voice-response&#x27;, (req, res) => {&#xA;    console.log("&#128222; Incoming call from Twilio");&#xA;&#xA;    const twiml = new twilio.twiml.VoiceResponse();&#xA;    twiml.say("Hello! Welcome to the service. How can I help you?");&#xA;    &#xA;    // Prevent Twilio from hanging up too early&#xA;    twiml.pause({ length: 5 });&#xA;&#xA;    twiml.connect().stream({&#xA;        url: `wss://your-ngrok-url/ws`,&#xA;        track: "inbound_track"&#xA;    });&#xA;&#xA;    console.log("&#128736;️ Twilio Stream URL:", `wss://your-ngrok-url/ws`);&#xA;    &#xA;    res.type(&#x27;text/xml&#x27;).send(twiml.toString());&#xA;});&#xA;

    &#xA;

    WebSocket Server Handling Twilio Audio Stream

    &#xA;

    wss.on(&#x27;connection&#x27;, (ws) => {&#xA;    console.log("&#128279; WebSocket Connected! Waiting for audio input...");&#xA;&#xA;    ws.on(&#x27;message&#x27;, (message) => {&#xA;        console.log(`&#128266; Received ${message.length} bytes of audio from Twilio`);&#xA;&#xA;        // Save raw audio data for debugging&#xA;        fs.appendFileSync(&#x27;twilio-audio.raw&#x27;, message);&#xA;&#xA;        // Check if audio is non-empty but contains only noise&#xA;        if (message.length &lt; 100) {&#xA;            console.warn("⚠️ Warning: Audio data from Twilio is very small. Might be silent.");&#xA;        }&#xA;    });&#xA;&#xA;    ws.on(&#x27;close&#x27;, () => {&#xA;        console.log("❌ WebSocket Disconnected!");&#xA;        &#xA;        // Convert Twilio audio for debugging&#xA;        exec(`ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav`, (err) => {&#xA;            if (err) console.error("❌ FFmpeg Conversion Error:", err);&#xA;            else console.log("✅ Twilio Audio Saved as `twilio-audio.wav`");&#xA;        });&#xA;    });&#xA;&#xA;    ws.on(&#x27;error&#x27;, (error) => console.error("⚠️ WebSocket Error:", error));&#xA;});&#xA;

    &#xA;

    Questions :

    &#xA;

      &#xA;
    • Why is the audio from Twilio being received as a clicking noise instead of actual speech ?
    • &#xA;

    • Do I need to strip any additional metadata from the raw bytes before saving ?
    • &#xA;

    • Is there a known issue with Twilio’s mulaw format when streaming audio over WebSockets ?
    • &#xA;

    • How can I confirm that Google STT is receiving properly formatted audio ?
    • &#xA;

    &#xA;

    Additional Context :

    &#xA;

      &#xA;
    • Twilio <stream></stream> is connected and receiving data (confirmed by logs).
    • &#xA;

    • WebSocket successfully receives and saves audio, but it only plays noise.
    • &#xA;

    • Tried multiple ffmpeg conversions, Google STT configurations, and raw data inspection.
    • &#xA;

    • Still no recognizable speech in the audio output.
    • &#xA;

    &#xA;

    Any help is greatly appreciated ! 🙏

    &#xA;