Newest 'ffmpeg' Questions - Stack Overflow

http://stackoverflow.com/questions/tagged/ffmpeg

Les articles publiés sur le site

  • Microphone and Camera Issues After Installing FFmpeg [migrated]

    6 avril, par Fang

    After installing FFmpeg using Winget on my Windows laptop, my microphone started capturing a lot of noise it doesnt work and my camera began flickering. The only fix that worked was performing a full system reset via USB boot. Normal reset or uninstalling drivers wont work. Issue occured on win10/win11 after installing ffmeg Note: I've installed ffmpeg on my other laptop everything works fine.

  • Get the maximum frequency of an audio spectrum

    6 avril, par milahu

    I want to detect the cutoff frequency of the AAC audio encoder used to compress an M4A audio file.

    This cutoff frequency (or maximum frequency) is an indicator of audio quality. High-quality audio has a cutoff around 20KHz (fullband), medium-quality audio has a cutoff around 14KHz (superwideband), low-quality audio has a cutoff around 7KHz (wideband), super-low-quality audio has a cutoff around 3KHz (narrowband). See also: voice frequency

    Example spectrum of a 2 hours movie, generated with sox, with a maximum frequency around 19.6KHz:

    audio spectrum with maximum frequency around 19.6KHz

    The program should ignore noise below a certain loudness, for example -80dB.

    Here is a Python script generated by deepseek.com but it returns 0.2KHz instead of 19.6KHz.

    #!/usr/bin/env python3
    
    # get the maximum frequency
    # of an audio spectrum
    # as an indicator
    # of the actual audio quality
    
    # generated by deepseek.com
    
    # prompt
    """
    create a python script
    to detect the maximum frequency 
    in an m4a audio file.
    that maximum frequency is produced
    by the lowpass filter
    of the aac audio encoder.
    high-quality audio
    has a maximum frequency
    around 20 KHz (fullband),
    low-quality audio
    has a maximum frequency
    around 3 KHz (narrowband).
    use ffmpeg to decode the audio
    to pcm
    in chunks of 10 seconds.
    for each chunk:
    detect the local maximum,
    print the local maximum
    and the chunk time
    with the format
    f"t={t}sec f={f}KHz",
    update the global maximum.
    to detect the local maximum,
    remove the noise floor
    around -110dB,
    then find the maximum frequency
    in the spectrum.
    accept some command line options:
    --ss n:
    pass as "-ss n" to ffmpeg.
    --to n:
    pass as "-to n" to ffmpeg.
    both -ss and -to args
    must come before the -i arg
    for ffmpeg input seeking.
    print all frequencies in KHz.
    add a shebang line before the script,
    spaced by an empty line.
    do not recode the audio with ffmpeg.
    use ffprobe to get the input samplerate,
    usually 48KHz or 44.1KHz.
    create a python class,
    so we dont have to pass all parameters to functions.
    add a command line option to select the audio track id, by default zero.
    """
    
    #!/usr/bin/env python3
    
    import argparse
    import numpy as np
    import subprocess
    import sys
    from tempfile import NamedTemporaryFile
    
    
    class AudioAnalyzer:
        def __init__(self, input_file, audio_track=0, start_time=None, end_time=None):
            self.input_file = input_file
            self.audio_track = audio_track
            self.start_time = start_time
            self.end_time = end_time
            self.sample_rate = self._get_sample_rate()
            self.global_max_freq = 0
            self.global_max_time = 0
    
        def _get_sample_rate(self):
            cmd = [
                'ffprobe',
                '-v', 'error',
                '-select_streams', f'a:{self.audio_track}',
                '-show_entries', 'stream=sample_rate',
                '-of', 'default=noprint_wrappers=1:nokey=1',
                self.input_file
            ]
            result = subprocess.run(cmd, capture_output=True, text=True)
            return float(result.stdout.strip())
    
        def _get_ffmpeg_command(self):
            cmd = [
                'ffmpeg',
                '-hide_banner',
                '-loglevel', 'error',
            ]
            
            if self.start_time is not None:
                cmd.extend(['-ss', str(self.start_time)])
            if self.end_time is not None:
                cmd.extend(['-to', str(self.end_time)])
                
            cmd.extend([
                '-i', self.input_file,
                '-map', f'0:a:{self.audio_track}',
                '-ac', '1',  # convert to mono
                '-f', 'f32le',  # 32-bit float PCM
                '-'
            ])
            
            return cmd
    
        def analyze(self, chunk_size=10):
            ffmpeg_cmd = self._get_ffmpeg_command()
            
            with subprocess.Popen(ffmpeg_cmd, stdout=subprocess.PIPE) as process:
                chunk_samples = int(chunk_size * self.sample_rate)
                bytes_per_sample = 4  # 32-bit float
                chunk_bytes = chunk_samples * bytes_per_sample
                
                current_time = self.start_time if self.start_time is not None else 0
                
                while True:
                    raw_data = process.stdout.read(chunk_bytes)
                    if not raw_data:
                        break
                    
                    samples = np.frombuffer(raw_data, dtype=np.float32)
                    if len(samples) == 0:
                        continue
                    
                    local_max_freq = self._analyze_chunk(samples)
                    
                    print(f"t={current_time:.1f}sec f={local_max_freq:.1f}KHz")
                    
                    if local_max_freq > self.global_max_freq:
                        self.global_max_freq = local_max_freq
                        self.global_max_time = current_time
                    
                    current_time += chunk_size
    
        def _analyze_chunk(self, samples):
            # Apply Hanning window
            window = np.hanning(len(samples))
            windowed_samples = samples * window
            
            # Compute FFT
            fft = np.fft.rfft(windowed_samples)
            magnitudes = np.abs(fft)
            
            # Convert to dB
            eps = 1e-10  # avoid log(0)
            magnitudes_db = 20 * np.log10(magnitudes + eps)
            
            # Frequency bins
            freqs = np.fft.rfftfreq(len(samples), 1.0 / self.sample_rate) / 1000  # in KHz
            
            # Remove noise floor (-110dB)
            threshold = -110
            valid_indices = magnitudes_db > threshold
            valid_freqs = freqs[valid_indices]
            valid_magnitudes = magnitudes_db[valid_indices]
            
            if len(valid_freqs) == 0:
                return 0
            
            # Find frequency with maximum magnitude
            max_idx = np.argmax(valid_magnitudes)
            max_freq = valid_freqs[max_idx]
            
            return max_freq
    
    
    def main():
        parser = argparse.ArgumentParser(description='Detect maximum frequency in audio file')
        parser.add_argument('input_file', help='Input audio file (m4a)')
        parser.add_argument('--ss', type=float, help='Start time in seconds')
        parser.add_argument('--to', type=float, help='End time in seconds')
        parser.add_argument('--track', type=int, default=0, help='Audio track ID (default: 0)')
        args = parser.parse_args()
    
        analyzer = AudioAnalyzer(
            input_file=args.input_file,
            audio_track=args.track,
            start_time=args.ss,
            end_time=args.to
        )
        
        print(f"Analyzing audio file: {args.input_file}")
        print(f"Sample rate: {analyzer.sample_rate/1000:.1f} KHz")
        print(f"Audio track: {args.track}")
        if args.ss is not None:
            print(f"Start time: {args.ss} sec")
        if args.to is not None:
            print(f"End time: {args.to} sec")
        print("---")
        
        analyzer.analyze()
        
        print("---")
        print(f"Global maximum: t={analyzer.global_max_time:.1f}sec f={analyzer.global_max_freq:.1f}KHz")
        
        if analyzer.global_max_freq > 15:
            print("Quality: Fullband (high quality)")
        elif analyzer.global_max_freq > 5:
            print("Quality: Wideband (medium quality)")
        else:
            print("Quality: Narrowband (low quality)")
    
    
    if __name__ == '__main__':
        main()
    

    Similar question: How to find the max frequency at a certain db in a fft signal


    edited by kesh

    Here is an example psd indicating the fullband quality with a psd dropoff around 20 kHz.

    psd plot

  • Command-line streaming webcam with audio from Ubuntu server in WebM format

    6 avril, par mjtb

    I am trying to stream video and audio from my webcam connected to my headless Ubuntu server (running Maverick 10.10). I want to be able to stream in WebM format (VP8 video + OGG). Bandwidth is limited, and so the stream must be below 1Mbps.

    I have tried using FFmpeg. I am able to record WebM video from the webcam with the following:

    ffmpeg -s 640x360 \
    -f video4linux2 -i /dev/video0 -isync -vcodec libvpx -vb 768000 -r 10 -vsync 1 \
    -f alsa -ac 1 -i hw:1,0 -acodec libvorbis -ab 32000 -ar 11025 \
    -f webm /var/www/telemed/test.webm 
    

    However despite experimenting with all manner of vsync and async options, I can either get out of sync audio, or Benny Hill style fast-forward video with matching fast audio. I have also been unable to get this actually working with ffserver (by replacing the test.webm path and filename with the relevant feed filename).

    The objective is to get a live, audio + video feed which is viewable in a modern browser, in a tight bandwidth, using only open-source components. (None of that MP3 format legal chaff)

    My questions are therefore: How would you go about streaming webm from a webcam via Linux with in-sync audio? What software you use?

    Have you succeeded in encoding webm from a webcam with in-sync audio via FFmpeg? If so, what command did you issue?

    Is it worth persevering with FFmpeg + FFserver, or are there other more suitable command-line tools around (e.g. VLC which doesn't seem too well built for encoding)?

    Is something like Gstreamer + flumotion configurable from the command line? If so, where do I find command line documentation because flumotion doc is rather light on command line details?

    Thanks in advance!

  • Generate thumbnail for text file

    6 avril, par Sophivorus

    Suppose a user uploads a .txt or .php file, and I want to generate a .png thumbnail for it. Is there a simple way of doing it, that doesn't require me to open the file and write its contents into a new .png? I have ImageMagick and FFmpeg available, there must be a way to take advantage of that, but I've been looking a lot and no luck yet.

    Thanks in advance.

  • ffmpeg : error while loading shared libraries : libopenh264.so.5 [closed]

    6 avril, par ESZ

    I am using ffmpeg and getting this error

    ffmpeg: error while loading shared libraries: libopenh264.so.5: cannot open shared object file: No such file or directory

    I have already checked if the library exists and it does. I added it to /etc/ld.so.conf as mentioned in this previous question but it doesn't work.