Newest 'ffmpeg' Questions - Stack Overflow

http://stackoverflow.com/questions/tagged/ffmpeg

Les articles publiés sur le site

  • Encoding problem with x264 and not divisible by 4 resolutions

    23 février, par Valentin Maschenko

    I'm encoding frames using H264 from BRGA to Yuv420p with high preset. It works fine with most resolutions, but images on 1366x768 are heavily distorted. So far, I've found that if width or height is not divisible by 4, there can be issues like this. Do you know how I can fix that?

    Distorted image

    Code:

    stride = this.width * 4;
    
    encoder = new CodecContext(Codec.FindEncoderById(AVCodecID.H264))
    {
        Width = this.width,
        Height = this.height,
        Framerate = new AVRational(1, framerate),
        TimeBase = new AVRational(1, framerate),
        PixelFormat = AVPixelFormat.Yuv420p,
        Profile = (int)FF_PROFILE.H264High,
        MaxBFrames = 0,
        GopSize = 10,
    };
    
    encoder.Open(null, new MediaDictionary
    {
        ["crf"] = "22",
        ["tune"] = "zerolatency",
        ["preset"] = "veryfast",
        ["subme"] = "5"
    });
    
    rgbFrame.Width = width;
    rgbFrame.Height = height;
    rgbFrame.Format = (int)AVPixelFormat.Bgra;
    unsafe
    {
        fixed (byte* ptr = frame)
        {
            rgbFrame.Data[0] = (nint)ptr;
        }
    }
    rgbFrame.Linesize[0] = stride;
    rgbFrame.Pts = pts++;
    
    yuvFrame.Width = width;
    yuvFrame.Height = height;
    yuvFrame.Format = (int)AVPixelFormat.Yuv420p;
    
    yuvFrame.EnsureBuffer();
    yuvFrame.MakeWritable();
    videoFrameConverter.ConvertFrame(rgbFrame, yuvFrame);
    yuvFrame.Pts = pts;
    
    var encodedFrames = encoder.EncodeFrame(yuvFrame, packetRef);
    var packet = encodedFrames.FirstOrDefault();
    var data = packet?.Data.ToArray() ?? [];
    
  • Display stream with FFmpeg, python and opencv

    23 février, par Ηλίας Κωνσταντινίδης

    Situation : I have a basler camera connected to a raspberry pi, and I am trying to livestream it's feed with FFmpg to a tcp port in my windows PC in order to monitor whats happening in front of the camera.

    Things that work : I manage to set up a python script on the raspberry pi which is responsible for recording the frames, feed them to a pipe and streaming them to a tcp port. From that port, I am able to display the stream using FFplay.

    My problem : FFplay is great for testing out quickly and easily if the direction you are heading is correct, but I want to "read" every frame from the stream, do some processing and then displaying the stream with opencv. That, I am not able to do yet.

    Minimaly reprsented, that's the code I use on the raspberry pi side of things :

    command = ['ffmpeg',
               '-y',
               '-i', '-',
               '-an',
               '-c:v', 'mpeg4',
               '-r', '50',
               '-f', 'rtsp',
               '-rtsp_transport',
               'tcp','rtsp://192.168.1.xxxx:5555/live.sdp']
    
    p = subprocess.Popen(command, stdin=subprocess.PIPE) 
    
    while camera.IsGrabbing():  # send images as stream until Ctrl-C
        grabResult = camera.RetrieveResult(100, pylon.TimeoutHandling_ThrowException)
        
        if grabResult.GrabSucceeded():
            image = grabResult.Array
            image = resize_compress(image)
            p.stdin.write(image)
        grabResult.Release() 
    
    

    On my PC if I use the following FFplay command on a terminal, it works and it displays the stream in real time :

    ffplay -rtsp_flags listen rtsp://192.168.1.xxxx:5555/live.sdp?tcp

    On my PC if I use the following python script, the stream begins, but it fails in the cv2.imshow function because I am not sure how to decode it:

    import subprocess
    import cv2
    
    command = ['C:/ffmpeg/bin/ffmpeg.exe',
               '-rtsp_flags', 'listen',
               '-i', 'rtsp://192.168.1.xxxx:5555/live.sdp?tcp?', 
               '-']
    
    p1 = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    
    while True:
        frame = p1.stdout.read()
        cv2.imshow('image', frame)
        cv2.waitKey(1)
    

    Does anyone knows what I need to change in either of those scripts in order to get i to work?

    Thank you in advance for any tips.

  • How to fix fps (stream ? container ? - wrong value) in mov file - using ffmpeg [closed]

    22 février, par rgr
    Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'test.mov':
      Metadata:
        major_brand     : qt
        minor_version   : 512
        compatible_brands: qt
        encoder         : Lavf61.9.106
      Duration: 00:02:29.98, start: 0.000000, bitrate: 2622795 kb/s
      Stream #0:0[0x1]: Video: prores (HQ) (apch / 0x68637061), yuv422p10le(progressive), 3840x2160, 2622789 kb/s,
      59.95 fps, 59.94 tbr, 60k tbn (default)
      ---------
          Metadata:
            handler_name    : VideoHandler
            vendor_id       : FFMP
            encoder         : Lavc60.37.100 prores_ks
    

    How can I fix this value (59.95 fps) in a mov file? Because of it, Vegas loads the MOV file incorrectly (it recognizes it as 60fps and not 59.94fps). Frame times (dts) are ok, only this value is a problem.

    (Is this the fps value read from the container or the stream?)

    Using "ffmpeg -t -c copy" I can cut a fragment and then this value is fixed. But how can I fix the whole file without cutting it into fragments?

  • FFMPEG says MP3 file is longer than it actually is [closed]

    21 février, par badr2001

    I have a mp3 file which is 01:04:09 seconds long. When I use the following commmand:

    ffmpeg -i TestAudio_123.mp3 -ss 60 -to 120 -c:a libmp3lame -q:a 2 output.mp3
    

    I get this output in the console:

    Input #0, mp3, from 'TestAudio_123.mp3':
      Metadata:
        major_brand     : M4A
        minor_version   : 0
        compatible_brands: M4A isommp42
        voice-memo-uuid : 07BF4A32-29E8-4A28-89D5-B6676F9CB945
        title           : تسجيل جديد ٣٨
        encoder         : Lavf61.1.100
      Duration: 01:07:22.01, start: 0.023021, bitrate: 32 kb/s
      Stream #0:0: Audio: mp3 (mp3float), 48000 Hz, mono, fltp, 32 kb/s
    

    My question is why is the duration longer than the actual input file? Just to show the input file duration:

    enter image description here

  • Twilio Real-Time Media Streaming to WebSocket Receives Only Noise Instead of Speech

    21 février, par dannym25

    I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.

    Setup:

    • Twilio sends inbound audio to my WebSocket server.
    • WebSocket receives and saves the raw mulaw-encoded audio data from Twilio.
    • The audio is processed via Google Speech-to-Text for transcription.
    • When I attempt to play back the audio, it sounds like machine-gun-like noise instead of spoken words.

    1. Confirmed WebSocket Receives Data

    • The WebSocket successfully logs incoming audio chunks from Twilio:

    🔊 Received 379 bytes of audio from Twilio
    🔊 Received 379 bytes of audio from Twilio
    

    • This suggests Twilio is sending audio data, but it's not being interpreted correctly.

    2. Saving and Playing Raw Audio

    • I save the incoming raw mulaw (8000Hz) audio from Twilio to a file:

    fs.appendFileSync('twilio-audio.raw', message);
    

    • Then, I convert it to a .wav file using FFmpeg:

    ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav
    

    Problem: When I play the audio using ffplay, it contains no speech, only rapid clicking sounds.

    3. Ensured Correct Audio Encoding

    • Twilio sends mulaw 8000Hz mono format. • Verified that my ffmpeg conversion is using the same settings. • Tried different conversion methods:

    ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw -c:a pcm_s16le twilio-audio-fixed.wav
    

    → Same issue.

    4. Checked Google Speech-to-Text Input Format

    • Google STT requires proper encoding configuration:

    const request = {
        config: {
            encoding: 'MULAW',
            sampleRateHertz: 8000,
            languageCode: 'en-US',
        },
        interimResults: false,
    };
    

    • No errors from Google STT, but it never detects speech, likely because the input audio is just noise.

    5. Confirmed That Raw Audio is Not a WAV File

    • Since Twilio sends raw audio, I checked whether I needed to strip the header before processing. • Tried manually extracting raw bytes, but the issue persists.

    Current Theory:

    • The WebSocket server might be handling Twilio’s raw audio incorrectly before saving it.
    • There might be an additional header in the Twilio stream that needs to be removed before playback.
    • Twilio’s tag expects a WebSocket connection starting with wss:// instead of https://, and switching to wss:// partially fixed some previous connection issues.

    Code Snippets:

    Twilio Setup in TwiML Response

    app.post('/voice-response', (req, res) => {
        console.log("📞 Incoming call from Twilio");
    
        const twiml = new twilio.twiml.VoiceResponse();
        twiml.say("Hello! Welcome to the service. How can I help you?");
        
        // Prevent Twilio from hanging up too early
        twiml.pause({ length: 5 });
    
        twiml.connect().stream({
            url: `wss://your-ngrok-url/ws`,
            track: "inbound_track"
        });
    
        console.log("🛠️ Twilio Stream URL:", `wss://your-ngrok-url/ws`);
        
        res.type('text/xml').send(twiml.toString());
    });
    

    WebSocket Server Handling Twilio Audio Stream

    wss.on('connection', (ws) => {
        console.log("🔗 WebSocket Connected! Waiting for audio input...");
    
        ws.on('message', (message) => {
            console.log(`🔊 Received ${message.length} bytes of audio from Twilio`);
    
            // Save raw audio data for debugging
            fs.appendFileSync('twilio-audio.raw', message);
    
            // Check if audio is non-empty but contains only noise
            if (message.length < 100) {
                console.warn("⚠️ Warning: Audio data from Twilio is very small. Might be silent.");
            }
        });
    
        ws.on('close', () => {
            console.log("❌ WebSocket Disconnected!");
            
            // Convert Twilio audio for debugging
            exec(`ffmpeg -f mulaw -ar 8000 -ac 1 -i twilio-audio.raw twilio-audio.wav`, (err) => {
                if (err) console.error("❌ FFmpeg Conversion Error:", err);
                else console.log("✅ Twilio Audio Saved as `twilio-audio.wav`");
            });
        });
    
        ws.on('error', (error) => console.error("⚠️ WebSocket Error:", error));
    });
    

    Questions:

    • Why is the audio from Twilio being received as a clicking noise instead of actual speech?
    • Do I need to strip any additional metadata from the raw bytes before saving?
    • Is there a known issue with Twilio’s mulaw format when streaming audio over WebSockets?
    • How can I confirm that Google STT is receiving properly formatted audio?

    Additional Context:

    • Twilio is connected and receiving data (confirmed by logs).
    • WebSocket successfully receives and saves audio, but it only plays noise.
    • Tried multiple ffmpeg conversions, Google STT configurations, and raw data inspection.
    • Still no recognizable speech in the audio output.

    Any help is greatly appreciated! 🙏