Newest 'ffmpeg' Questions - Stack Overflow

http://stackoverflow.com/questions/tagged/ffmpeg

Les articles publiés sur le site

  • Are there any libraries on adding effect to audio, like phone, inner monologue, or sounds like a man/woman ? [closed]

    6 mars, par Mathew

    I'm trying to apply different audio effects, such as making audio sound like a phone call. Below is my current approach. As you can see, I'm using multiple filters and simple algorithms to achieve this effect, but the output quality isn't ideal.

    Since I need to implement many sound effects/filters, are there any ready-to-use libraries that could help?

    I've looked into FFmpeg filters and noticed mentions of LADSPA/LV2 plugins. Are these viable solutions? Any other suggestions would be greatly appreciated.

    public static void applySceneEffect(String inputPath, String outputPath, int sceneType) {
        LOGGER.info("apply scene effect {} to {}", sceneType, inputPath);
    
        try (FFmpegFrameGrabber grabber = new FFmpegFrameGrabber(inputPath);
             FFmpegFrameRecorder recorder = new FFmpegFrameRecorder(outputPath, grabber.getAudioChannels())) {
    
            grabber.setOption("vn", ""); 
            grabber.start();
    
           
            recorder.setAudioCodec(avcodec.AV_CODEC_ID_PCM_S16LE); 
            recorder.setSampleRate(grabber.getSampleRate());
            recorder.setAudioChannels(grabber.getAudioChannels());
            recorder.setAudioBitrate(grabber.getAudioBitrate());
            recorder.setFormat("wav"); 
    
    
            String audioFilter = String.join(",",
                    "aresample=8000",      
                    "highpass=f=300, lowpass=f=3400",       
                    "acompressor=threshold=-15dB:ratio=4:attack=10:release=100", 
                    "volume=1.5",          
                    "aecho=0.9:0.4:10:0.6"
            );
    
            FFmpegFrameFilter f1 = new FFmpegFrameFilter(audioFilter, grabber.getAudioChannels());
            f1.setSampleRate(grabber.getSampleRate());
            f1.start();
    
            recorder.start();
    
            Random random = new Random();
            double noiseLevel = 0.02; 
    
       
            while (true) {
                var frame = grabber.grabFrame(true, false, true, true);
                if (frame == null) {
                    break;
                }
    
                ShortBuffer audioBuffer = (ShortBuffer) frame.samples[0];
                short[] audioData = new short[audioBuffer.remaining()];
                audioBuffer.get(audioData);
    
                applyElectricNoise(audioData, grabber.getSampleRate());
    
                audioData = applyDistortion(audioData, 1.5, 30000);
    
                audioBuffer.rewind();
                audioBuffer.put(audioData);
                audioBuffer.flip();
    
    
                f1.push(frame); 
                Frame filteredFrame;
                while ((filteredFrame = f1.pull()) != null) {
                    recorder.record(filteredFrame); 
                }
            }
    
            recorder.stop();
            recorder.release();
            grabber.stop();
            grabber.release();
        } catch (FrameGrabber.Exception | FrameRecorder.Exception | FFmpegFrameFilter.Exception e) {
            throw new RuntimeException(e);
        }
    }
    
    
    private static final double NOISE_LEVEL = 0.005; 
    private static final int NOISE_FREQUENCY = 60;  
    
    public static void applyElectricNoise(short[] audioData, int sampleRate) {
        Random random = new Random();
    
        
        for (int i = 0; i < audioData.length; i++) {
            double noise = Math.sin(2 * Math.PI * NOISE_FREQUENCY * i / sampleRate);
    
            double electricNoise = random.nextGaussian() * NOISE_LEVEL * Short.MAX_VALUE + noise;
    
            audioData[i] = (short) Math.max(Math.min(audioData[i] + electricNoise, Short.MAX_VALUE), Short.MIN_VALUE); 
        }
    }
    
    public static short[] applyTremolo(short[] audioData, int sampleRate, double frequency, double depth) {
        double phase = 0.0;
        double phaseIncrement = 2 * Math.PI * frequency / sampleRate;
    
        for (int i = 0; i < audioData.length; i++) {
            double modulator = 1.0 - depth + depth * Math.sin(phase); 
            audioData[i] = (short) (audioData[i] * modulator);
    
            phase += phaseIncrement;
            if (phase > 2 * Math.PI) {
                phase -= 2 * Math.PI;
            }
        }
        return audioData;
    }
    
    public static short[] applyDistortion(short[] audioData, double gain, double threshold) {
        for (int i = 0; i < audioData.length; i++) {
            double sample = audioData[i] * gain;
    
            if (sample > threshold) {
                sample = threshold;
            } else if (sample < -threshold) {
                sample = -threshold;
            }
    
            audioData[i] = (short) sample;
        }
        return audioData;
    }
    
  • Generating grey nosie with FFmpeg

    5 mars, par Azat Khabibulin

    I have the following sound configuration:

    sub-bass:     -inf dBFS
    low bass:     -inf dBFS
    bass:         -inf dBFS
    high bass:    -inf dBFS
    low mids:     0 dBFS
    mids:         0 dBFS
    high mids:    -inf dBFS
    low treble:   -inf dBFS
    treble:       -inf dBFS
    high treble:  -inf dBFS
    

    If you wonder what is it, you can listen to this sound here.

    I'd like to create an audio file provided this sound configuration. FFmpeg filters seem like a good fit, but are not a strict requirement. It may be any command-line tool that handles this kind of task well.

    The problem is that I don't really have necessary background in audio theory. I cannot choose the right FFmpeg filter (other than to make a generic white noise), I do not know how to filter frequencies in FFmpeg, I cannot even convert this particular lexicon ("bass", "mids", etc.) into specific numeric frequencies.

  • Can't show image with opencv when importing av

    5 mars, par Flojomojo

    When importing the PyAv module, I am unable to show an image with opencv using imshow()

    Code without the PyAv module (works as expected)

    import cv2
    
    img = cv2.imread("test_image.jpeg")
    cv2.imshow('image', img)
    cv2.waitKey(0)
    

    Code with the import (doesn't work, just hangs)

    import cv2
    import av
    
    img = cv2.imread("test_image.jpeg")
    cv2.imshow('image', img)
    cv2.waitKey(0)
    

    OS: Linux arch 5.18.3-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 09 Jun 2022 16:14:10 +0000 x86_64 GNU/Linux

    Am I doing something wrong or is this a (un-)known issue?

  • How does FFmpeg determine the dispositions of an MP4 track ?

    5 mars, par obskyr

    The Issue

    FFmpeg has a concept of “dispositions” – a property that describes the purpose of a stream in a media file. For example, here are the streams in a file I have lying around, with the dispositions emphasized:

      Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo,
    fltp, 251 kb/s (default)
          Metadata:
            creation_time   : 2021-11-10T20:14:06.000000Z
            handler_name    : Core Media Audio
            vendor_id       : [0][0][0][0]
    
      Stream #0:1[0x2](und): Video: mjpeg (Baseline) (jpeg / 0x6765706A),
    yuvj420p(pc, bt470bg/unknown/unknown), 1024x1024, 0 kb/s, 0.0006 fps, 3.08 tbr,
    600 tbn (default) (attached pic) (timed thumbnails)
          Metadata:
            creation_time   : 2021-11-10T20:14:06.000000Z
            handler_name    : Core Media Video
            vendor_id       : [0][0][0][0]
    
      Stream #0:2[0x3](und): Data: bin_data (text / 0x74786574)
          Metadata:
            creation_time   : 2021-11-10T20:14:06.000000Z
            handler_name    : Core Media Text
    
      Stream #0:3[0x0]: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/
    unknown), 1024x1024 [SAR 144:144 DAR 1:1], 90k tbr, 90k tbn (attached pic)

    However, if I make any modification to this file’s chapter markers using the C++ library MP4v2 (even just re-saving the existing ones: auto f = MP4Modify("test.m4a"); MP4Chapter_t* chapterList; uint32_t chapterCount; MP4GetChapters(f, &chapterList, &chapterCount); MP4SetChapters(f, chapterList, chapterCount); MP4Close(f);), some of these dispositions are removed:

      Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo,
    fltp, 251 kb/s (default)
          Metadata:
            creation_time   : 2021-11-10T20:14:06.000000Z
            handler_name    : Core Media Audio
            vendor_id       : [0][0][0][0]
    
      Stream #0:1[0x2](und): Video: mjpeg (Baseline) (jpeg / 0x6765706A),
    yuvj420p(pc, bt470bg/unknown/unknown), 1024x1024, 0 kb/s, 0.0006 fps, 3.08 tbr,
    600 tbn (default) ← “attached pic” and “timed thumbnails” removed!
          Metadata:
            creation_time   : 2021-11-10T20:14:06.000000Z
            handler_name    : Core Media Video
            vendor_id       : [0][0][0][0]
    
      Stream #0:2[0x0]: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/
    unknown), 1024x1024 [SAR 144:144 DAR 1:1], 90k tbr, 90k tbn (attached pic)
    
      Stream #0:3[0x4](und): Data: bin_data (text / 0x74786574)
      This stream was moved to the end, but that’s intended behavior. It contains chapter titles, and we just edited the chapters.
          Metadata:
            creation_time   : 2025-03-05T09:56:31.000000Z

    It also renders the file unplayable in MPC-HC (but not in VLC!), which is apparently a bug in MP4v2. I’m currently investigating that bug to report and potentially fix it, but that’s a separate issue – in my journey there, I’m wracking my brain trying to understand what it is that MP4v2 changes to make FFmpeg stop reporting the “attached pic” and “timed thumbnails” dispositions. I’ve explored the before-and-afters in MP4 Box, and I can’t for the life of me find which atom it is that differs in a relevant way.

    (I’d love to share the files, but unfortunately the contents are under copyright – if anyone knows of a way to remove the audio from an MP4 file without changing anything else, let me know and I’ll upload dummied-out versions. Without them, I can’t really ask about the issue directly. I can at least show you the files’ respective atom trees, but I’m not sure how relevant that is.)

    The Question

    I thought I’d read FFmpeg’s source code to find out how it determines dispositions for MP4 streams, but of course, FFmpeg is very complex. Could someone who’s more familiar with C and/or FFmpeg’s codebase help me sleuth out how FFmpeg determines dispositions for MP4 files (in particular, “attached pic” and “timed thumbnails”)?

    Some Thoughts…

    • I figure searching for “attached_pic” might be a good start?
    • Could the MP4 muxer movenc.c be helpful?
    • I’d imagine what we’d really like to look at is the MP4 demuxing process, as it’s during demuxing that FFmpeg determines dispositions from the data in the file. After poring over the code for hours, however, I’ve been utterly unable to find where that happens.
  • Extracting frames from videos using ffmpeg, unpredictable behaviour [closed]

    4 mars, par Alex

    I am using this ffmpeg command to generate a bunch of images captures from a video:

      ffmpegCommand([
        "-i",
        inputPath,
        "-vf",
        `select='not(mod(n,${frame_interval}))',setpts='N/(${fps}*TB)'`,
        "-s",
        `320x200`,
        "-f",
        "image2",
        outputPath,
      ]);
    

    it is the most frame accurate method according to what I have researched on google and SO.

    this works well when the video is around 30 fps, with around 250 frame_interval.

    but when video is 5 fps, frame interval should obviously be lower, at around 70, because there are less frames in the video. But then I get a huge amount of images.

    // example for 30fps video
    ffmpeg -i test30.mp4 -vf select='not(mod(n,250))',setpts='N/(29.97*TB) .....
    
    // example for 5fps video
    ffmpeg -i test5.mp4 -vf select='not(mod(n,70))',setpts='N/(4.907*TB) ......
    

    What could be wrong here?