Recherche avancée

Médias (1)

Mot : - Tags -/bug

Autres articles (58)

  • Les autorisations surchargées par les plugins

    27 avril 2010, par

    Mediaspip core
    autoriser_auteur_modifier() afin que les visiteurs soient capables de modifier leurs informations sur la page d’auteurs

  • Publier sur MédiaSpip

    13 juin 2013

    Puis-je poster des contenus à partir d’une tablette Ipad ?
    Oui, si votre Médiaspip installé est à la version 0.2 ou supérieure. Contacter au besoin l’administrateur de votre MédiaSpip pour le savoir

  • Support audio et vidéo HTML5

    10 avril 2011

    MediaSPIP utilise les balises HTML5 video et audio pour la lecture de documents multimedia en profitant des dernières innovations du W3C supportées par les navigateurs modernes.
    Pour les navigateurs plus anciens, le lecteur flash Flowplayer est utilisé.
    Le lecteur HTML5 utilisé a été spécifiquement créé pour MediaSPIP : il est complètement modifiable graphiquement pour correspondre à un thème choisi.
    Ces technologies permettent de distribuer vidéo et son à la fois sur des ordinateurs conventionnels (...)

Sur d’autres sites (10710)

  • FFMPEG, DrawText Issue in Live Stream

    7 décembre 2022, par Kenneth

    I am using the following command to create an H264 stream with text data from a text file. Example data is fake. I am sending this to an RTSP server that then allows clients to connect. I am connecting from VLC to view the stream.

    


    See update, this only happens for the live stream. If I output to file, it looks correct.

    


    OS : Windows 10

    


    ffmpeg -f lavfi -re -i color=size=1280x720:rate=1:color=black ^
  -vf drawtext="fontsize=16:fontfile=C\\:/Windows/fonts/consola.ttf:fontcolor=white:textfile='livetext.txt':x=50:y=50: reload=1" ^
  -c:v libx264 -preset ultrafast -tune zerolatency -x264-params keyint=10:min-keyint=10 ^
  -f rtsp rtsp://127.0.0.1:60000/sorting


    


    The issue I am having is that the text shown in the video seems to be limited to 10 rows. On a fresh restart, I get even less. I don't see anything mentioned in the documentation about a limitation on length.

    


    I have tried different -preset and -tune options. Nothing improves this issue.

    


    Are there settings I should adjust to help this ?

    


    enter image description here

    


    Console Output :

    


    ..\ffmpeg\ffmpeg -f lavfi -re -i color=size=1280x720:rate=5:color=black -vf drawtext="fontsize=20:fontfile=C\\:/Windows/fonts/consola.ttf:fontcolor=white:textfile='livetext.txt':x=50:y=50: reload=5"  -c:v libx264 -preset ultrafast -tune zerolatency -x264-params keyint=10:min-keyint=10  -f rtsp rtsp://127.0.0.1:60000/sorting
ffmpeg version 2022-12-04-git-6c814093d8-essentials_build-www.gyan.dev Copyright (c) 2000-2022 the FFmpeg developers
  built with gcc 12.1.0 (Rev2, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-bzlib --enable-lzma --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-sdl2 --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-mediafoundation --enable-libass --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libvpl --enable-libgme --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libtheora --enable-libvo-amrwbenc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-librubberband
  libavutil      57. 43.100 / 57. 43.100
  libavcodec     59. 54.100 / 59. 54.100
  libavformat    59. 34.102 / 59. 34.102
  libavdevice    59.  8.101 / 59.  8.101
  libavfilter     8. 51.100 /  8. 51.100
  libswscale      6.  8.112 /  6.  8.112
  libswresample   4.  9.100 /  4.  9.100
  libpostproc    56.  7.100 / 56.  7.100
Input #0, lavfi, from 'color=size=1280x720:rate=5:color=black':
  Duration: N/A, start: 0.000000, bitrate: N/A
  Stream #0:0: Video: wrapped_avframe, yuv420p, 1280x720 [SAR 1:1 DAR 16:9], 5 fps, 5 tbr, 5 tbn
Stream mapping:
  Stream #0:0 -> #0:0 (wrapped_avframe (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 000002402dcca140] using SAR=1/1
[libx264 @ 000002402dcca140] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
[libx264 @ 000002402dcca140] profile Constrained Baseline, level 3.1, 4:2:0, 8-bit
[libx264 @ 000002402dcca140] 264 - core 164 r3101 b093bbe - H.264/MPEG-4 AVC codec - Copyleft 2003-2022 - http://www.videolan.org/x264.html - options: cabac=0 ref=1 deblock=0:0:0 analyse=0:0 me=dia subme=0 psy=1 psy_rd=1.00:0.00 mixed_ref=0 me_range=16 chroma_me=1 trellis=0 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=0 threads=11 lookahead_threads=11 sliced_threads=1 slices=11 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=0 weightp=0 keyint=10 keyint_min=6 scenecut=0 intra_refresh=0 rc=crf mbtree=0 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=0
Output #0, rtsp, to 'rtsp://127.0.0.1:60000/sorting':
  Metadata:
    encoder         : Lavf59.34.102
  Stream #0:0: Video: h264, yuv420p(progressive), 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 5 fps, 90k tbn
    Metadata:
      encoder         : Lavc59.54.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame=  485 fps=5.0 q=11.0 size=N/A time=00:01:36.80 bitrate=N/A speed=   1x     0x


    


    Update 1 :

    


    If I output to a file, the text is shown correctly (shown below). If I take out the -re argument for the output to match input rate, I get 150fps, so processing power does not seem to be the issue.

    


    Screenshot from mp4 output file

    


  • android exoplayer add ffmpeg extension to my project how ?

    7 février 2023, par Malo

    I have an android application containing exoplayer instance, some udp video play without sounds , so i want to add Ffmpeg extension to my project, i am working on windows system and need to follow the instructions below :

    


    https://github.com/google/ExoPlayer/blob/release-v2/extensions/ffmpeg/README.md


    


    So first step is Set the following shell variable :
cd ""
FFMPEG_MODULE_PATH="$(pwd)/extensions/ffmpeg/src/main"

    


    i downloaded Git to use as power shell, so what is pwd ??

    


    PLus...
Set the host platform (use "darwin-x86_64" for Mac OS X) :
HOST_PLATFORM="linux-x86_64" what is this variable in windows ?

    


    Please i am confused how to build this library manually in windows and it is not straightforward at all....

    


  • Computer crashing when using python tools in same script

    5 février 2023, par SL1997

    I am attempting to use the speech recognition toolkit VOSK and the speech diarization package Resemblyzer to transcibe audio and then identify the speakers in the audio.

    


    Tools :

    


    https://github.com/alphacep/vosk-api
    
https://github.com/resemble-ai/Resemblyzer

    


    I can do both things individually but run into issues when trying to do them when running the one python script.

    


    I used the following guide when setting up the diarization system :

    


    https://medium.com/saarthi-ai/who-spoke-when-build-your-own-speaker-diarization-module-from-scratch-e7d725ee279

    


    Computer specs are as follows :

    


    Intel(R) Core(TM) i3-7100 CPU @ 3.90GHz, 3912 Mhz, 2 Core(s), 4 Logical Processor(s)
    
32GB RAM

    


    The following is my code, I am not to sure if using threading is appropriate or if I even implemented it correctly, how can I best optimize this code as to achieve the results I am looking for and not crash.

    


    from vosk import Model, KaldiRecognizer
from pydub import AudioSegment
import json
import sys
import os
import subprocess
import datetime
from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path
from resemblyzer.hparams import sampling_rate
from spectralcluster import SpectralClusterer
import threading
import queue
import gc



def recognition(queue, audio, FRAME_RATE):

    model = Model("Vosk_Models/vosk-model-small-en-us-0.15")

    rec = KaldiRecognizer(model, FRAME_RATE)
    rec.SetWords(True)

    rec.AcceptWaveform(audio.raw_data)
    result = rec.Result()

    transcript = json.loads(result)#["text"]

    #return transcript
    queue.put(transcript)



def diarization(queue, audio):

    wav = preprocess_wav(audio)
    encoder = VoiceEncoder("cpu")
    _, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
    print(cont_embeds.shape)

    clusterer = SpectralClusterer(
        min_clusters=2,
        max_clusters=100,
        p_percentile=0.90,
        gaussian_blur_sigma=1)

    labels = clusterer.predict(cont_embeds)

    def create_labelling(labels, wav_splits):

        times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
        labelling = []
        start_time = 0

        for i, time in enumerate(times):
            if i > 0 and labels[i] != labels[i - 1]:
                temp = [str(labels[i - 1]), start_time, time]
                labelling.append(tuple(temp))
                start_time = time
            if i == len(times) - 1:
                temp = [str(labels[i]), start_time, time]
                labelling.append(tuple(temp))

        return labelling

    #return
    labelling = create_labelling(labels, wav_splits)
    queue.put(labelling)



def identify_speaker(queue1, queue2):

    transcript = queue1.get()
    labelling = queue2.get()

    for speaker in labelling:

        speakerID = speaker[0]
        speakerStart = speaker[1]
        speakerEnd = speaker[2]

        result = transcript['result']
        words = [r['word'] for r in result if speakerStart < r['start'] < speakerEnd]
        #return
        print("Speaker",speakerID,":",' '.join(words), "\n")





def main():

    queue1 = queue.Queue()
    queue2 = queue.Queue()

    FRAME_RATE = 16000
    CHANNELS = 1

    podcast = AudioSegment.from_mp3("Podcast_Audio/Film-Release-Clip.mp3")
    podcast = podcast.set_channels(CHANNELS)
    podcast = podcast.set_frame_rate(FRAME_RATE)

    first_thread = threading.Thread(target=recognition, args=(queue1, podcast, FRAME_RATE))
    second_thread = threading.Thread(target=diarization, args=(queue2, podcast))
    third_thread = threading.Thread(target=identify_speaker, args=(queue1, queue2))

    first_thread.start()
    first_thread.join()
    gc.collect()

    second_thread.start()
    second_thread.join()
    gc.collect()

    third_thread.start()
    third_thread.join()
    gc.collect()

    # transcript = recognition(podcast,FRAME_RATE)
    #
    # labelling = diarization(podcast)
    #
    # print(identify_speaker(transcript, labelling))


if __name__ == '__main__':
    main()


    


    When I say crash I mean everything freezes, I have to hold down the power button on the desktop and turn it back on again. No blue/blank screen, just frozen in my IDE looking at my code. Any help in resolving this issue would be greatly appreciated.