Recherche avancée
Autres articles (1)
-
Submit bugs and patches
13 avril 2011Unfortunately a software is never perfect.
If you think you have found a bug, report it using our ticket system. Please to help us to fix it by providing the following information : the browser you are using, including the exact version as precise an explanation as possible of the problem if possible, the steps taken resulting in the problem a link to the site / page in question
If you think you have solved the bug, fill in a ticket and attach to it a corrective patch.
You may also (...)
Sur d’autres sites (235)
-
Has anyone used the speech driven animation and can you make it work ?
16 août 2020, par hopw JanI'm talking about this repo. I installed all the dependencies but I can't make it work. Any help is highly appreciated ( :


I'm running python 3.7.5.


This is my code :


import sda
import scipy.io.wavfile as wav
from PIL import Image

va = sda.VideoAnimator(gpu=0, model_path="crema")# Instantiate the animator
fs, audio_clip = wav.read("example/audio.wav")
still_frame = Image.open("example/image.bmp")
vid, aud = va(frame, audio_clip, fs=fs)
va.save_video(vid, aud, "generated.mp4")



Sadly it doesn't seem to work and it gives me this error :


Warning (from warnings module):
 File "C:\Users\Alex\AppData\Local\Programs\Python\Python37\lib\site-packages\pydub\utils.py", line 170
 warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning)
RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work
Traceback (most recent call last):
 File "C:\Users\Alex\Desktop\test\test.py", line 8, in <module>
 vid, aud = va(frame, audio_clip, fs=fs)
NameError: name 'frame' is not defined
</module>


Spent about 2 hours and I can't do anything, I'm out of ideas.
If you take the time to help me thank you from the bottom of my heart.


-
Convert mediarecorder blobs to a type that google speech to text can transcribe
5 janvier 2021, par Manesha RameshI am making an app where the user browser records the user speaking and sends it to the server which then passes it on to the Google speech to the text interface. I am using mediaRecorder to get 1-second blobs which are sent to a server. On the server-side, I send these blobs over to the Google speech to the text interface. However, I am getting an empty transcriptions.



I know what the issue is. Mediarecorder's default Mime Type id audio/WebM codec=opus, which is not accepted by google's speech to text API. After doing some research, I realize I need to use ffmpeg to convert blobs to LInear16. However, ffmpeg only accepts audio FILES and I want to be able to convert BLOBS. Then I can send the resulting converted blobs over to the API interface.



server.js



wsserver.on('connection', socket => {
 console.log("Listening on port 3002")
 audio = {
 content: null
 }
 socket.on('message',function(message){
 // const buffer = new Int16Array(message, 0, Math.floor(data.byteLength / 2));
 // console.log(`received from a client: ${new Uint8Array(message)}`);
 // console.log(message);
 audio.content = message.toString('base64')
 console.log(audio.content);
 livetranscriber.createRequest(audio).then(request => {
 livetranscriber.recognizeStream(request);
 });


 });
});




livetranscriber



module.exports = {
 createRequest: function(audio){
 const encoding = 'LINEAR16';
const sampleRateHertz = 16000;
const languageCode = 'en-US';
 return new Promise((resolve, reject, err) =>{
 if (err){
 reject(err)
 }
 else{
 const request = {
 audio: audio,
 config: {
 encoding: encoding,
 sampleRateHertz: sampleRateHertz,
 languageCode: languageCode,
 },
 interimResults: false, // If you want interim results, set this to true
 };
 resolve(request);
 }
 });

 },
 recognizeStream: async function(request){
 const [response] = await client.recognize(request)
 const transcription = response.results
 .map(result => result.alternatives[0].transcript)
 .join('\n');
 console.log(`Transcription: ${transcription}`);
 // console.log(message);
 // message.pipe(recognizeStream);
 },

}




client



recorder.ondataavailable = function(e) {
 console.log('Data', e.data);

 var ws = new WebSocket('ws://localhost:3002/websocket');
 ws.onopen = function() {
 console.log("opening connection");

 // const stream = websocketStream(ws)
 // const duplex = WebSocket.createWebSocketStream(ws, { encoding: 'utf8' });
 var blob = new Blob(e, { 'type' : 'audio/wav; base64' });
 ws.send(blob.data);
 // e.data).pipe(stream); 
 // console.log(e.data);
 console.log("Sent the message")
 };

 // chunks.push(e.data);
 // socket.emit('data', e.data);
 }



-
Google Speech API returns empty result for some FLAC files, and not for the others although they have same codec and sample rate
15 mars 2021, par ChadBelow code is what I used to make request for transcription.


import io
from google.cloud import speech_v1p1beta1 as speech
def transcribe_file(speech_file):
 """Transcribe the given audio file."""

 client = speech.SpeechClient()

 encoding = speech.RecognitionConfig.AudioEncoding.FLAC
 if os.path.splitext(speech_file)[1] == ".wav":
 encoding = speech.RecognitionConfig.AudioEncoding.LINEAR16
 with io.open(speech_file, "rb") as audio_file:
 content = audio_file.read()

 audio = speech.RecognitionAudio(content=content)
 config = speech.RecognitionConfig(
 encoding=speech.RecognitionConfig.AudioEncoding.FLAC,
 sample_rate_hertz=32000,
 language_code="ja-JP",
 max_alternatives=3,
 enable_word_time_offsets=True,
 enable_automatic_punctuation=True,
 enable_word_confidence=True,
 )

 response = client.recognize(config=config, audio=audio)
 #print(speech_file, "Recognition Done")
 return response



As I wrote in title, the results of response has empty list for some files, and not for some files.
They have same sample rate and codec(32000, FLAC)


Below is the result of
ffprobe -i "AUDIOFILE" -show_streams
for one of each cases.

Left one is empty one. The only difference is duration of file.


How can I get non empty results ?




Edit :


Result of ffprobe show stream show format


Something not captured in one screen


Sadly, re-mux didn't work.


I used ffmpeg-git-20210225


ffbrobe result of broken one


./ffprobe -show_streams -show_format broken.flac 
ffprobe version N-56320-ge937457b7b-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2007-2021 the FFmpeg developers
 built with gcc 8 (Debian 8.3.0-6)
 configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
 libavutil 56. 66.100 / 56. 66.100
 libavcodec 58.125.101 / 58.125.101
 libavformat 58. 68.100 / 58. 68.100
 libavdevice 58. 12.100 / 58. 12.100
 libavfilter 7.107.100 / 7.107.100
 libswscale 5. 8.100 / 5. 8.100
 libswresample 3. 8.100 / 3. 8.100
 libpostproc 55. 8.100 / 55. 8.100
Input #0, flac, from 'broken.flac':
 Metadata:
 encoder : Lavf58.45.100
 Duration: 00:00:00.90, start: 0.000000, bitrate: 342 kb/s
 Stream #0:0: Audio: flac, 32000 Hz, mono, s16
[STREAM]
index=0
codec_name=flac
codec_long_name=FLAC (Free Lossless Audio Codec)
profile=unknown
codec_type=audio
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
sample_fmt=s16
sample_rate=32000
channels=1
channel_layout=mono
bits_per_sample=0
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/32000
start_pts=0
start_time=0.000000
duration_ts=28672
duration=0.896000
bit_rate=N/A
max_bit_rate=N/A
bits_per_raw_sample=16
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=broken.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0.000000
duration=0.896000
size=38362
bit_rate=342517
probe_score=100
TAG:encoder=Lavf58.45.100
[/FORMAT]



ffprobe result of non_broken one


./ffprobe -show_streams -show_format non_broken.flac 
ffprobe version N-56320-ge937457b7b-static https://johnvansickle.com/ffmpeg/ Copyright (c) 2007-2021 the FFmpeg developers
 built with gcc 8 (Debian 8.3.0-6)
 configuration: --enable-gpl --enable-version3 --enable-static --disable-debug --disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc --enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp --enable-libgme --enable-gray --enable-libaom --enable-libfribidi --enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab --enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid --enable-libzvbi --enable-libzimg
 libavutil 56. 66.100 / 56. 66.100
 libavcodec 58.125.101 / 58.125.101
 libavformat 58. 68.100 / 58. 68.100
 libavdevice 58. 12.100 / 58. 12.100
 libavfilter 7.107.100 / 7.107.100
 libswscale 5. 8.100 / 5. 8.100
 libswresample 3. 8.100 / 3. 8.100
 libpostproc 55. 8.100 / 55. 8.100
Input #0, flac, from 'non_broken.flac':
 Metadata:
 encoder : Lavf58.45.100
 Duration: 00:00:00.86, start: 0.000000, bitrate: 358 kb/s
 Stream #0:0: Audio: flac, 32000 Hz, mono, s16
[STREAM]
index=0
codec_name=flac
codec_long_name=FLAC (Free Lossless Audio Codec)
profile=unknown
codec_type=audio
codec_tag_string=[0][0][0][0]
codec_tag=0x0000
sample_fmt=s16
sample_rate=32000
channels=1
channel_layout=mono
bits_per_sample=0
id=N/A
r_frame_rate=0/0
avg_frame_rate=0/0
time_base=1/32000
start_pts=0
start_time=0.000000
duration_ts=27648
duration=0.864000
bit_rate=N/A
max_bit_rate=N/A
bits_per_raw_sample=16
nb_frames=N/A
nb_read_frames=N/A
nb_read_packets=N/A
DISPOSITION:default=0
DISPOSITION:dub=0
DISPOSITION:original=0
DISPOSITION:comment=0
DISPOSITION:lyrics=0
DISPOSITION:karaoke=0
DISPOSITION:forced=0
DISPOSITION:hearing_impaired=0
DISPOSITION:visual_impaired=0
DISPOSITION:clean_effects=0
DISPOSITION:attached_pic=0
DISPOSITION:timed_thumbnails=0
[/STREAM]
[FORMAT]
filename=non_broken.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0.000000
duration=0.864000
size=38701
bit_rate=358342
probe_score=100
TAG:encoder=Lavf58.45.100
[/FORMAT]



And the result of
ffmpeg -f lavfi -i sine=d=0.864:r=32000 output.flac


ffmpeg version 3.4.8-0ubuntu0.2 Copyright (c) 2000-2020 the FFmpeg developers
 built with gcc 7 (Ubuntu 7.5.0-3ubuntu1~18.04)
 configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared
 WARNING: library configuration mismatch
 avcodec configuration: --prefix=/usr --extra-version=0ubuntu0.2 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --enable-gpl --disable-stripping --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librubberband --enable-librsvg --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzmq --enable-libzvbi --enable-omx --enable-openal --enable-opengl --enable-sdl2 --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libopencv --enable-libx264 --enable-shared --enable-version3 --disable-doc --disable-programs --enable-libopencore_amrnb --enable-libopencore_amrwb --enable-libtesseract --enable-libvo_amrwbenc
 libavutil 55. 78.100 / 55. 78.100
 libavcodec 57.107.100 / 57.107.100
 libavformat 57. 83.100 / 57. 83.100
 libavdevice 57. 10.100 / 57. 10.100
 libavfilter 6.107.100 / 6.107.100
 libavresample 3. 7. 0 / 3. 7. 0
 libswscale 4. 8.100 / 4. 8.100
 libswresample 2. 9.100 / 2. 9.100
 libpostproc 54. 7.100 / 54. 7.100
Input #0, lavfi, from 'sine=d=0.864:r=32000':
 Duration: N/A, start: 0.000000, bitrate: 512 kb/s
 Stream #0:0: Audio: pcm_s16le, 32000 Hz, mono, s16, 512 kb/s
File 'output.flac' already exists. Overwrite ? [y/N] y
Stream mapping:
 Stream #0:0 -> #0:0 (pcm_s16le (native) -> flac (native))
Press [q] to stop, [?] for help
Output #0, flac, to 'output.flac':
 Metadata:
 encoder : Lavf57.83.100
 Stream #0:0: Audio: flac, 32000 Hz, mono, s16, 128 kb/s
 Metadata:
 encoder : Lavc57.107.100 flac
[Parsed_sine_0 @ 0x55c317ddda00] EOF timestamp not reliable
size= 16kB time=00:00:00.86 bitrate= 154.0kbits/s speed= 205x 
video:0kB audio:8kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 99.364586%