Recherche avancée

Médias (0)

Mot : - Tags -/albums

Aucun média correspondant à vos critères n’est disponible sur le site.

Autres articles (1)

  • Submit bugs and patches

    13 avril 2011

    Unfortunately a software is never perfect.
    If you think you have found a bug, report it using our ticket system. Please to help us to fix it by providing the following information : the browser you are using, including the exact version as precise an explanation as possible of the problem if possible, the steps taken resulting in the problem a link to the site / page in question
    If you think you have solved the bug, fill in a ticket and attach to it a corrective patch.
    You may also (...)

Sur d’autres sites (235)

  • Has anyone used the speech driven animation and can you make it work ?

    16 août 2020, par hopw Jan

    I'm talking about this repo. I installed all the dependencies but I can't make it work. Any help is highly appreciated ( :

    


    I'm running python 3.7.5.

    


    This is my code :

    


    import sda
import scipy.io.wavfile as wav
from PIL import Image

va = sda.VideoAnimator(gpu=0, model_path="crema")# Instantiate the animator
fs, audio_clip = wav.read("example/audio.wav")
still_frame = Image.open("example/image.bmp")
vid, aud = va(frame, audio_clip, fs=fs)
va.save_video(vid, aud, "generated.mp4")


    


    Sadly it doesn't seem to work and it gives me this error :

    


    Warning (from warnings module):&#xA;  File "C:\Users\Alex\AppData\Local\Programs\Python\Python37\lib\site-packages\pydub\utils.py", line 170&#xA;    warn("Couldn&#x27;t find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)&#xA;RuntimeWarning: Couldn&#x27;t find ffmpeg or avconv - defaulting to ffmpeg, but may not work&#xA;Traceback (most recent call last):&#xA;  File "C:\Users\Alex\Desktop\test\test.py", line 8, in <module>&#xA;    vid, aud = va(frame, audio_clip, fs=fs)&#xA;NameError: name &#x27;frame&#x27; is not defined&#xA;</module>

    &#xA;

    Spent about 2 hours and I can't do anything, I'm out of ideas.&#xA;If you take the time to help me thank you from the bottom of my heart.

    &#xA;

  • Convert mediarecorder blobs to a type that google speech to text can transcribe

    5 janvier 2021, par Manesha Ramesh

    I am making an app where the user browser records the user speaking and sends it to the server which then passes it on to the Google speech to the text interface. I am using mediaRecorder to get 1-second blobs which are sent to a server. On the server-side, I send these blobs over to the Google speech to the text interface. However, I am getting an empty transcriptions.

    &#xA;&#xA;

    I know what the issue is. Mediarecorder's default Mime Type id audio/WebM codec=opus, which is not accepted by google's speech to text API. After doing some research, I realize I need to use ffmpeg to convert blobs to LInear16. However, ffmpeg only accepts audio FILES and I want to be able to convert BLOBS. Then I can send the resulting converted blobs over to the API interface.

    &#xA;&#xA;

    server.js

    &#xA;&#xA;

    wsserver.on(&#x27;connection&#x27;, socket => {&#xA;    console.log("Listening on port 3002")&#xA;    audio = {&#xA;        content: null&#xA;    }&#xA;  socket.on(&#x27;message&#x27;,function(message){&#xA;        // const buffer = new Int16Array(message, 0, Math.floor(data.byteLength / 2));&#xA;        // console.log(`received from a client: ${new Uint8Array(message)}`);&#xA;        // console.log(message);&#xA;        audio.content = message.toString(&#x27;base64&#x27;)&#xA;        console.log(audio.content);&#xA;        livetranscriber.createRequest(audio).then(request => {&#xA;            livetranscriber.recognizeStream(request);&#xA;        });&#xA;&#xA;&#xA;  });&#xA;});&#xA;

    &#xA;&#xA;

    livetranscriber

    &#xA;&#xA;

    module.exports = {&#xA;    createRequest: function(audio){&#xA;        const encoding = &#x27;LINEAR16&#x27;;&#xA;const sampleRateHertz = 16000;&#xA;const languageCode = &#x27;en-US&#x27;;&#xA;        return new Promise((resolve, reject, err) =>{&#xA;            if (err){&#xA;                reject(err)&#xA;            }&#xA;            else{&#xA;                const request = {&#xA;                    audio: audio,&#xA;                    config: {&#xA;                      encoding: encoding,&#xA;                      sampleRateHertz: sampleRateHertz,&#xA;                      languageCode: languageCode,&#xA;                    },&#xA;                    interimResults: false, // If you want interim results, set this to true&#xA;                  };&#xA;                  resolve(request);&#xA;            }&#xA;        });&#xA;&#xA;    },&#xA;    recognizeStream: async function(request){&#xA;        const [response] = await client.recognize(request)&#xA;        const transcription = response.results&#xA;            .map(result => result.alternatives[0].transcript)&#xA;            .join(&#x27;\n&#x27;);&#xA;        console.log(`Transcription: ${transcription}`);&#xA;        // console.log(message);&#xA;        // message.pipe(recognizeStream);&#xA;    },&#xA;&#xA;}&#xA;

    &#xA;&#xA;

    client

    &#xA;&#xA;

     recorder.ondataavailable = function(e) {&#xA;            console.log(&#x27;Data&#x27;, e.data);&#xA;&#xA;            var ws = new WebSocket(&#x27;ws://localhost:3002/websocket&#x27;);&#xA;            ws.onopen = function() {&#xA;              console.log("opening connection");&#xA;&#xA;              // const stream = websocketStream(ws)&#xA;              // const duplex = WebSocket.createWebSocketStream(ws, { encoding: &#x27;utf8&#x27; });&#xA;              var blob = new Blob(e, { &#x27;type&#x27; : &#x27;audio/wav; base64&#x27; });&#xA;              ws.send(blob.data);&#xA;              // e.data).pipe(stream); &#xA;              // console.log(e.data);&#xA;              console.log("Sent the message")&#xA;            };&#xA;&#xA;            // chunks.push(e.data);&#xA;            // socket.emit(&#x27;data&#x27;, e.data);&#xA;        }&#xA;

    &#xA;

  • Google Speech API returns empty result for some FLAC files, and not for the others although they have same codec and sample rate

    15 mars 2021, par Chad

    Below code is what I used to make request for transcription.

    &#xA;

    import io&#xA;from google.cloud import speech_v1p1beta1 as speech&#xA;def transcribe_file(speech_file):&#xA;    """Transcribe the given audio file."""&#xA;&#xA;    client = speech.SpeechClient()&#xA;&#xA;    encoding = speech.RecognitionConfig.AudioEncoding.FLAC&#xA;    if os.path.splitext(speech_file)[1] == ".wav":&#xA;        encoding = speech.RecognitionConfig.AudioEncoding.LINEAR16&#xA;    with io.open(speech_file, "rb") as audio_file:&#xA;        content = audio_file.read()&#xA;&#xA;    audio = speech.RecognitionAudio(content=content)&#xA;    config = speech.RecognitionConfig(&#xA;        encoding=speech.RecognitionConfig.AudioEncoding.FLAC,&#xA;        sample_rate_hertz=32000,&#xA;        language_code="ja-JP",&#xA;        max_alternatives=3,&#xA;        enable_word_time_offsets=True,&#xA;        enable_automatic_punctuation=True,&#xA;        enable_word_confidence=True,&#xA;    )&#xA;&#xA;    response = client.recognize(config=config, audio=audio)&#xA;    #print(speech_file, "Recognition Done")&#xA;    return response&#xA;

    &#xA;

    As I wrote in title, the results of response has empty list for some files, and not for some files.&#xA;They have same sample rate and codec(32000, FLAC)

    &#xA;

    Below is the result of ffprobe -i "AUDIOFILE" -show_streams for one of each cases.

    &#xA;

    Left one is empty one. The only difference is duration of file.

    &#xA;

    How can I get non empty results ?

    &#xA;

    Result of ffprobe

    &#xA;

    Edit :

    &#xA;

    Result of ffprobe show stream show format

    &#xA;

    Something not captured in one screen

    &#xA;

    Sadly, re-mux didn't work.

    &#xA;

    I used ffmpeg-git-20210225

    &#xA;

    ffbrobe result of broken one

    &#xA;

    ./ffprobe -show_streams -show_format broken.flac &#xA;ffprobe version N-56320-ge937457b7b-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2007-2021 the FFmpeg developers&#xA;  built with gcc 8 (Debian 8.3.0-6)&#xA;  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg&#xA;  libavutil      56. 66.100 / 56. 66.100&#xA;  libavcodec     58.125.101 / 58.125.101&#xA;  libavformat    58. 68.100 / 58. 68.100&#xA;  libavdevice    58. 12.100 / 58. 12.100&#xA;  libavfilter     7.107.100 /  7.107.100&#xA;  libswscale      5.  8.100 /  5.  8.100&#xA;  libswresample   3.  8.100 /  3.  8.100&#xA;  libpostproc    55.  8.100 / 55.  8.100&#xA;Input #0, flac, from &#x27;broken.flac&#x27;:&#xA;  Metadata:&#xA;    encoder         : Lavf58.45.100&#xA;  Duration: 00:00:00.90, start: 0.000000, bitrate: 342 kb/s&#xA;  Stream #0:0: Audio: flac, 32000 Hz, mono, s16&#xA;[STREAM]&#xA;index=0&#xA;codec_name=flac&#xA;codec_long_name=FLAC (Free Lossless Audio Codec)&#xA;profile=unknown&#xA;codec_type=audio&#xA;codec_tag_string=[0][0][0][0]&#xA;codec_tag=0x0000&#xA;sample_fmt=s16&#xA;sample_rate=32000&#xA;channels=1&#xA;channel_layout=mono&#xA;bits_per_sample=0&#xA;id=N/A&#xA;r_frame_rate=0/0&#xA;avg_frame_rate=0/0&#xA;time_base=1/32000&#xA;start_pts=0&#xA;start_time=0.000000&#xA;duration_ts=28672&#xA;duration=0.896000&#xA;bit_rate=N/A&#xA;max_bit_rate=N/A&#xA;bits_per_raw_sample=16&#xA;nb_frames=N/A&#xA;nb_read_frames=N/A&#xA;nb_read_packets=N/A&#xA;DISPOSITION:default=0&#xA;DISPOSITION:dub=0&#xA;DISPOSITION:original=0&#xA;DISPOSITION:comment=0&#xA;DISPOSITION:lyrics=0&#xA;DISPOSITION:karaoke=0&#xA;DISPOSITION:forced=0&#xA;DISPOSITION:hearing_impaired=0&#xA;DISPOSITION:visual_impaired=0&#xA;DISPOSITION:clean_effects=0&#xA;DISPOSITION:attached_pic=0&#xA;DISPOSITION:timed_thumbnails=0&#xA;[/STREAM]&#xA;[FORMAT]&#xA;filename=broken.flac&#xA;nb_streams=1&#xA;nb_programs=0&#xA;format_name=flac&#xA;format_long_name=raw FLAC&#xA;start_time=0.000000&#xA;duration=0.896000&#xA;size=38362&#xA;bit_rate=342517&#xA;probe_score=100&#xA;TAG:encoder=Lavf58.45.100&#xA;[/FORMAT]&#xA;

    &#xA;

    ffprobe result of non_broken one

    &#xA;

    ./ffprobe -show_streams -show_format non_broken.flac &#xA;ffprobe version N-56320-ge937457b7b-static https://johnvansickle.com/ffmpeg/  Copyright (c) 2007-2021 the FFmpeg developers&#xA;  built with gcc 8 (Debian 8.3.0-6)&#xA;  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg&#xA;  libavutil      56. 66.100 / 56. 66.100&#xA;  libavcodec     58.125.101 / 58.125.101&#xA;  libavformat    58. 68.100 / 58. 68.100&#xA;  libavdevice    58. 12.100 / 58. 12.100&#xA;  libavfilter     7.107.100 /  7.107.100&#xA;  libswscale      5.  8.100 /  5.  8.100&#xA;  libswresample   3.  8.100 /  3.  8.100&#xA;  libpostproc    55.  8.100 / 55.  8.100&#xA;Input #0, flac, from &#x27;non_broken.flac&#x27;:&#xA;  Metadata:&#xA;    encoder         : Lavf58.45.100&#xA;  Duration: 00:00:00.86, start: 0.000000, bitrate: 358 kb/s&#xA;  Stream #0:0: Audio: flac, 32000 Hz, mono, s16&#xA;[STREAM]&#xA;index=0&#xA;codec_name=flac&#xA;codec_long_name=FLAC (Free Lossless Audio Codec)&#xA;profile=unknown&#xA;codec_type=audio&#xA;codec_tag_string=[0][0][0][0]&#xA;codec_tag=0x0000&#xA;sample_fmt=s16&#xA;sample_rate=32000&#xA;channels=1&#xA;channel_layout=mono&#xA;bits_per_sample=0&#xA;id=N/A&#xA;r_frame_rate=0/0&#xA;avg_frame_rate=0/0&#xA;time_base=1/32000&#xA;start_pts=0&#xA;start_time=0.000000&#xA;duration_ts=27648&#xA;duration=0.864000&#xA;bit_rate=N/A&#xA;max_bit_rate=N/A&#xA;bits_per_raw_sample=16&#xA;nb_frames=N/A&#xA;nb_read_frames=N/A&#xA;nb_read_packets=N/A&#xA;DISPOSITION:default=0&#xA;DISPOSITION:dub=0&#xA;DISPOSITION:original=0&#xA;DISPOSITION:comment=0&#xA;DISPOSITION:lyrics=0&#xA;DISPOSITION:karaoke=0&#xA;DISPOSITION:forced=0&#xA;DISPOSITION:hearing_impaired=0&#xA;DISPOSITION:visual_impaired=0&#xA;DISPOSITION:clean_effects=0&#xA;DISPOSITION:attached_pic=0&#xA;DISPOSITION:timed_thumbnails=0&#xA;[/STREAM]&#xA;[FORMAT]&#xA;filename=non_broken.flac&#xA;nb_streams=1&#xA;nb_programs=0&#xA;format_name=flac&#xA;format_long_name=raw FLAC&#xA;start_time=0.000000&#xA;duration=0.864000&#xA;size=38701&#xA;bit_rate=358342&#xA;probe_score=100&#xA;TAG:encoder=Lavf58.45.100&#xA;[/FORMAT]&#xA;

    &#xA;

    And the result of ffmpeg -f lavfi -i sine=d=0.864:r=32000 output.flac

    &#xA;

    ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers&#xA;  built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)&#xA;  configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared&#xA;  WARNING: library configuration mismatch&#xA;  avcodec     configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc&#xA;  libavutil      55. 78.100 / 55. 78.100&#xA;  libavcodec     57.107.100 / 57.107.100&#xA;  libavformat    57. 83.100 / 57. 83.100&#xA;  libavdevice    57. 10.100 / 57. 10.100&#xA;  libavfilter     6.107.100 /  6.107.100&#xA;  libavresample   3.  7.  0 /  3.  7.  0&#xA;  libswscale      4.  8.100 /  4.  8.100&#xA;  libswresample   2.  9.100 /  2.  9.100&#xA;  libpostproc    54.  7.100 / 54.  7.100&#xA;Input #0, lavfi, from &#x27;sine=d=0.864:r=32000&#x27;:&#xA;  Duration: N/A, start: 0.000000, bitrate: 512 kb/s&#xA;    Stream #0:0: Audio: pcm_s16le, 32000 Hz, mono, s16, 512 kb/s&#xA;File &#x27;output.flac&#x27; already exists. Overwrite ? [y/N] y&#xA;Stream mapping:&#xA;  Stream #0:0 -> #0:0 (pcm_s16le (native) -> flac (native))&#xA;Press [q] to stop, [?] for help&#xA;Output #0, flac, to &#x27;output.flac&#x27;:&#xA;  Metadata:&#xA;    encoder         : Lavf57.83.100&#xA;    Stream #0:0: Audio: flac, 32000 Hz, mono, s16, 128 kb/s&#xA;    Metadata:&#xA;      encoder         : Lavc57.107.100 flac&#xA;[Parsed_sine_0 @ 0x55c317ddda00] EOF timestamp not reliable&#xA;size=      16kB time=00:00:00.86 bitrate= 154.0kbits/s speed= 205x    &#xA;video:0kB audio:8kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 99.364586%&#xA;

    &#xA;