Newest 'ffmpeg' Questions - Stack Overflow

http://stackoverflow.com/questions/tagged/ffmpeg

Les articles publiés sur le site

  • m3u8 playback issue with VLC player and ffmpeg [closed]

    2 avril, par YUZ

    I’m creating a timelapse video from a folder of images using FFmpeg, generating two HLS (m3u8) playlists: one at 1080p and another at 2K resolution. The playlists and .ts segments appear to be generated correctly, but when I play the m3u8 files in media players like VLC or PotPlayer, the video does not play seamlessly. Instead, it plays in a segment-by-segment manner (e.g., it stops after each segment and doesn’t automatically continue to the next one). I expect the entire video to play continuously without interruptions. What could be the issue?

    I ran these commands on cmd:

    for %i in (*.jpeg) do echo file '%cd%\%i' >> C:\Users\stitch_img\test\input_list.txt
    
    ffmpeg -y -f concat -safe 0 -i "C:\Users\stitch_img\test\input_list.txt" -vf "scale=1920x1080" -c:v libx264 -r 16 -hls_time 4 -hls_playlist_type vod -hls_segment_filename "C:\Users\stitch_img\test\1080_video_%03d.ts" "C:\Users\stitch_img\test\1080_playlist.m3u8"
    
    ffmpeg -y -f concat -safe 0 -i "C:\Users\stitch_img\test\input_list.txt" -vf 
    "scale=2560x1440" -c:v libx264 -r 16 -hls_time 4 -hls_playlist_type vod -hls_segment_filename "C:\Users\stitch_img\test\2k_video_%03d.ts" "C:\Users\stitch_img\test\2k_playlist.m3u8"
    
    echo #EXTM3U > "C:\Users\stitch_img\test\master_playlist.m3u8"
    echo #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080 >> "C:\Users\stitch_img\test\master_playlist.m3u8"
    echo 1080_playlist.m3u8 >> "C:\Users\stitch_img\test\master_playlist.m3u8"
    echo #EXT-X-STREAM-INF:BANDWIDTH=8000000,RESOLUTION=2560x1440 >> "C:\Users\stitch_img\test\master_playlist.m3u8"
    echo 2k_playlist.m3u8 >> "C:\Users\stitch_img\test\master_playlist.m3u8"
    

    What I’ve Tried:

    • Verified that the .ts files are playable individually (e.g., opening 1080p_000.ts in VLC works fine).

    • Ensured the m3u8 files are structured correctly with #EXT-X-ENDLIST (indicating a VOD playlist).

    • Used HTTP playback to rule out local path resolution issues, but the issue persists

    m3u8 masters:

    #EXTM3U
    #EXT-X-STREAM-INF: BANDWIDTH=5000000, RESOLUTION=1920x1080 1080_playlist.m3u8
    #EXT-X-STREAM-INF: BANDWIDTH=8000000, RESOLUTION=2560x1440 2k_playlist.m3u8
    

    m3u8_2k:

    #EXTM3U
    #EXT-X-VERSION: 3
    #EXT-X-TARGETDURATION: 4 
    #EXT-X-MEDIA-SEQUENCE:0 
    #EXT-X-PLAYLIST-TYPE: VOD
    #EXTINF: 4.125000, 
    2k_video_000.ts 
    #EXTINF:3.312500,
    2k_video_001.ts
    #EXT-X-ENDLIST
    

    m3u8_1080p:

    #EXTM3U
    #EXT-X-VERSION: 3
    #EXT-X-TARGETDURATION: 4 
    #EXT-X-MEDIA-SEQUENCE:0 
    #EXT-X-PLAYLIST-TYPE: VOD
    #EXTINF: 4.125000, 
    1080_video_000.ts 
    #EXTINF:3.312500,
    1080_video_001.ts
    #EXT-X-ENDLIST
    
  • m3u8 playback issue with VLC player [closed]

    2 avril, par YUZ

    I’m creating a timelapse video from a folder of images using FFmpeg, generating two HLS (m3u8) playlists: one at 1080p and another at 2K resolution. The playlists and .ts segments appear to be generated correctly, but when I play the m3u8 files in media players like VLC or PotPlayer, the video does not play seamlessly. Instead, it plays in a segment-by-segment manner (e.g., it stops after each segment and doesn’t automatically continue to the next one). I expect the entire video to play continuously without interruptions. What could be the issue?

    I ran these commands on cmd:

    for %i in (*.jpeg) do echo file '%cd%\%i' >> C:\Users\stitch_img\test\input_list.txt
    
    ffmpeg -y -f concat -safe 0 -i "C:\Users\stitch_img\test\input_list.txt" -vf "scale=1920x1080" -c:v libx264 -r 16 -hls_time 4 -hls_playlist_type vod -hls_segment_filename "C:\Users\stitch_img\test\1080_video_%03d.ts" "C:\Users\stitch_img\test\1080_playlist.m3u8"
    
    ffmpeg -y -f concat -safe 0 -i "C:\Users\stitch_img\test\input_list.txt" -vf 
    "scale=2560x1440" -c:v libx264 -r 16 -hls_time 4 -hls_playlist_type vod -hls_segment_filename "C:\Users\stitch_img\test\2k_video_%03d.ts" "C:\Users\stitch_img\test\2k_playlist.m3u8"
    
    echo #EXTM3U > "C:\Users\stitch_img\test\master_playlist.m3u8"
    echo #EXT-X-STREAM-INF:BANDWIDTH=5000000,RESOLUTION=1920x1080 >> "C:\Users\stitch_img\test\master_playlist.m3u8"
    echo 1080_playlist.m3u8 >> "C:\Users\stitch_img\test\master_playlist.m3u8"
    echo #EXT-X-STREAM-INF:BANDWIDTH=8000000,RESOLUTION=2560x1440 >> "C:\Users\stitch_img\test\master_playlist.m3u8"
    echo 2k_playlist.m3u8 >> "C:\Users\stitch_img\test\master_playlist.m3u8"
    

    What I’ve Tried:

    • Verified that the .ts files are playable individually (e.g., opening 1080p_000.ts in VLC works fine).

    • Ensured the m3u8 files are structured correctly with #EXT-X-ENDLIST (indicating a VOD playlist).

    • Used HTTP playback to rule out local path resolution issues, but the issue persists

    m3u8 masters:

    #EXTM3U
    #EXT-X-STREAM-INF: BANDWIDTH=5000000, RESOLUTION=1920x1080 1080_playlist.m3u8
    #EXT-X-STREAM-INF: BANDWIDTH=8000000, RESOLUTION=2560x1440 2k_playlist.m3u8
    

    m3u8_2k:

    #EXTM3U
    #EXT-X-VERSION: 3
    #EXT-X-TARGETDURATION: 4 
    #EXT-X-MEDIA-SEQUENCE:0 
    #EXT-X-PLAYLIST-TYPE: VOD
    #EXTINF: 4.125000, 
    2k_video_000.ts 
    #EXTINF:3.312500,
    2k_video_001.ts
    #EXT-X-ENDLIST
    

    m3u8_1080p:

    #EXTM3U
    #EXT-X-VERSION: 3
    #EXT-X-TARGETDURATION: 4 
    #EXT-X-MEDIA-SEQUENCE:0 
    #EXT-X-PLAYLIST-TYPE: VOD
    #EXTINF: 4.125000, 
    1080_video_000.ts 
    #EXTINF:3.312500,
    1080_video_001.ts
    #EXT-X-ENDLIST
    
  • How to Adjust Google TTS SSML to Match Original SRT Timing ?

    2 avril, par Alexandre Silkin

    I have an .srt file where each speech segment is supposed to last a specific duration (e.g., 4 seconds). However, when I generate the speech using Google Text-to-Speech (TTS) with SSML, the resulting audio plays the same segment in a shorter time (e.g., 3 seconds).

    I want to adjust the speech rate dynamically in SSML so that each segment matches its original timing. My idea is to use ffmpeg to extract the actual duration of each generated speech segment, then calculate the speech rate percentage as: generated duration speech rate = -------------------- original duration

    This percentage would then be applied in SSML using the tag, like: Text to be spoken

    How can I accurately measure the duration of each segment using ffmpeg, and what is the best way to apply the correct speech rate in SSML to match the original .srt timing?

    I tried duration and my SSML should look like this:

            f.write(f'\t

    {break_until_start}{text}

    \n')

    Code writing the SSML:

    text = value['text'] start_time_ms = int(value['start_ms']) # Start time in milliseconds previous_end_ms = int(subsDict.get(str(int(key) - 1), {}).get('end_ms', 0)) # Get the previous end time gap_to_fill = max(0, start_time_ms - previous_end_ms)

            text = text.replace("&", "&amp;").replace('"', "&quot;").replace("'", "&apos;").replace("<", "&lt;").replace(
                ">", "&gt;")
    
            break_until_start = f'' if gap_to_fill > 0 else ''
    
            f.write(f'\t

    {break_until_start}{text}

    \n') f.write('\n')
  • Erros while openh264 decoding in ffmpeg

    2 avril, par Paul_ghost

    I am trying to decode and then re-encode an RTP video-stream. I need to use OpenH264 codec by Cisco. I used this codec with the FFmpeg tool:

    ffmpeg -hide_banner \
       -protocol_whitelist file,rtp,udp \
       -err_detect ignore_err \
       -c:v libopenh264 \
       -i /src/stream.sdp \
       -c:v libopenh264 \
       -profile:v high \
       $OUTPUT_FILE &
    

    The stream has a little packet-loss (around 2-3 % in the middle of the stream), and when packet loss occur, the codec print a lot of errors in the logs:

    [sdp @ 0x5bfcad506740] max delay reached. need to consume packet 
    [sdp @ 0x5bfcad506740] RTP: missed 1 packets
    [libopenh264 @ 0x5bfcad560cc0] [OpenH264] this = 0x0x5bfcad6905e0, Warning:DecodeCurrentAccessUnit() failed (468766) in frame: 20 uiDId: 0 uiQId: 0
    [libopenh264 @ 0x5bfcad560cc0] DecodeFrame failed
    [vist#0:0/h264 @ 0x5bfcad58c340] [dec:libopenh264 @ 0x5bfcad55fdc0] Error submitting packet to decoder: Unknown error occurred
    [libopenh264 @ 0x5bfcad560cc0] [OpenH264] this = 0x0x5bfcad6905e0, Warning:referencing pictures lost due frame gaps exist, prev_frame_num: 19, curr_frame_num: 21
    [libopenh264 @ 0x5bfcad560cc0] DecodeFrame failed
    [vist#0:0/h264 @ 0x5bfcad58c340] [dec:libopenh264 @ 0x5bfcad55fdc0] Error submitting packet to decoder: Unknown error occurred
    [libopenh264 @ 0x5bfcad560cc0] DecodeFrame failed
    [vist#0:0/h264 @ 0x5bfcad58c340] [dec:libopenh264 @ 0x5bfcad55fdc0] Error submitting packet to decoder: Unknown error occurred
    

    Because of this, I get an output with a lot of frame freezes. Is it possible to make OpenH264 to decode as many frames as possible, even if artifacts occur?

    P.S. I have already tried to use FFmpeg build-in H.264 decoder and it works well. It decodes as many frames as possible and output a video with some artifacts. But I need the same result with OpenH264.

  • how can I set a specific duration for my speech using Google Text to Speech

    2 avril, par Alexandre Silkin

    I went through the documentation of Google Text to Speech SSML. https://developers.google.com/assistant/actions/reference/ssml#prosody

    So there is a tag called which as per the documentation of W3 Specification can accept an attribute called duration which is a value in seconds or milliseconds for the desired time to take to read the contained text.

    So Hello, How are you? should take 3 seconds for google text to speech to speak this! But when i try it here https://cloud.google.com/text-to-speech/ , its not working and also I tried it in rest API.

    How can I get the time of each speech segment generated by the Google Text to Speech SSML that´s a little different from the original .srt from witch it was generated from, I was looking for a way to do so with ffmpeg so then I could divide the (tts_generated_speech/original_duration) in order to get the speech_rate_percentage witch I could use for each speech segment, so it would match the original_duration time.

    Original post: Is there a way to make Google Text to Speech, speak text for a desired duration?