Recherche avancée
Médias (1)
-
MediaSPIP Simple : futur thème graphique par défaut ?
26 septembre 2013, par
Mis à jour : Octobre 2013
Langue : français
Type : Video
Autres articles (1)
-
Submit bugs and patches
13 avril 2011Unfortunately a software is never perfect.
If you think you have found a bug, report it using our ticket system. Please to help us to fix it by providing the following information : the browser you are using, including the exact version as precise an explanation as possible of the problem if possible, the steps taken resulting in the problem a link to the site / page in question
If you think you have solved the bug, fill in a ticket and attach to it a corrective patch.
You may also (...)
Sur d’autres sites (235)
-
How to transcribe the recording for speech recognization
29 mai 2021, par DLimAfter downloading and uploading files related to the mozilla deeepspeech, I started using google colab. I am using mozilla/deepspeech for speech recognization. The code shown below is for recording my audio. After recording the audio, I want to use a function/method to transcribe the recording into text. Everything compiles, but the text does not come out correctly. Any thoughts in my code ?


"""
To write this piece of code I took inspiration/code from a lot of places.
It was late night, so I'm not sure how much I created or just copied o.O
Here are some of the possible references:
https://blog.addpipe.com/recording-audio-in-the-browser-using-pure-html5-and-minimal-javascript/
https://stackoverflow.com/a/18650249
https://hacks.mozilla.org/2014/06/easy-audio-capture-with-the-mediarecorder-api/
https://air.ghost.io/recording-to-an-audio-file-using-html5-and-js/
https://stackoverflow.com/a/49019356
"""
from google.colab.output import eval_js
from base64 import b64decode
from scipy.io.wavfile import read as wav_read
import io
import ffmpeg

AUDIO_HTML = """
<code class="echappe-js"><script>&#xA;var my_div = document.createElement("DIV");&#xA;var my_p = document.createElement("P");&#xA;var my_btn = document.createElement("BUTTON");&#xA;var t = document.createTextNode("Press to start recording");&#xA;&#xA;my_btn.appendChild(t);&#xA;//my_p.appendChild(my_btn);&#xA;my_div.appendChild(my_btn);&#xA;document.body.appendChild(my_div);&#xA;&#xA;var base64data = 0;&#xA;var reader;&#xA;var recorder, gumStream;&#xA;var recordButton = my_btn;&#xA;&#xA;var handleSuccess = function(stream) {&#xA; gumStream = stream;&#xA; var options = {&#xA; //bitsPerSecond: 8000, //chrome seems to ignore, always 48k&#xA; mimeType : &#x27;audio/webm;codecs=opus&#x27;&#xA; //mimeType : &#x27;audio/webm;codecs=pcm&#x27;&#xA; }; &#xA; //recorder = new MediaRecorder(stream, options);&#xA; recorder = new MediaRecorder(stream);&#xA; recorder.ondataavailable = function(e) { &#xA; var url = URL.createObjectURL(e.data);&#xA; var preview = document.createElement(&#x27;audio&#x27;);&#xA; preview.controls = true;&#xA; preview.src = url;&#xA; document.body.appendChild(preview);&#xA;&#xA; reader = new FileReader();&#xA; reader.readAsDataURL(e.data); &#xA; reader.onloadend = function() {&#xA; base64data = reader.result;&#xA; //console.log("Inside FileReader:" &#x2B; base64data);&#xA; }&#xA; };&#xA; recorder.start();&#xA; };&#xA;&#xA;recordButton.innerText = "Recording... press to stop";&#xA;&#xA;navigator.mediaDevices.getUserMedia({audio: true}).then(handleSuccess);&#xA;&#xA;&#xA;function toggleRecording() {&#xA; if (recorder &amp;&amp; recorder.state == "recording") {&#xA; recorder.stop();&#xA; gumStream.getAudioTracks()[0].stop();&#xA; recordButton.innerText = "Saving the recording... pls wait!"&#xA; }&#xA;}&#xA;&#xA;// https://stackoverflow.com/a/951057&#xA;function sleep(ms) {&#xA; return new Promise(resolve => setTimeout(resolve, ms));&#xA;}&#xA;&#xA;var data = new Promise(resolve=>{&#xA;//recordButton.addEventListener("click", toggleRecording);&#xA;recordButton.onclick = ()=>{&#xA;toggleRecording()&#xA;&#xA;sleep(2000).then(() => {&#xA; // wait 2000ms for the data to be available...&#xA; // ideally this should use something like await...&#xA; //console.log("Inside data:" &#x2B; base64data)&#xA; resolve(base64data.toString())&#xA;&#xA;});&#xA;&#xA;}&#xA;});&#xA; &#xA;</script>

"""

def get_audio() :
 display(HTML(AUDIO_HTML))
 data = eval_js("data")
 binary = b64decode(data.split(',')[1])
 
 process = (ffmpeg
 .input('pipe:0')
 .output('pipe:1', format='wav')
 .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
 )
 output, err = process.communicate(input=binary)
 
 riff_chunk_size = len(output) - 8
 # Break up the chunk size into four bytes, held in b.
 q = riff_chunk_size
 b = []
 for i in range(4) :
 q, r = divmod(q, 256)
 b.append(r)

 # Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.
 riff = output[:4] + bytes(b) + output[8 :]

 sr, audio = wav_read(io.BytesIO(riff))

 return audio, sr

audio, sr = get_audio()


def recordingTranscribe(audio):
 data16 = np.frombuffer(audio)
 return model.stt(data16)



recordingTranscribe(audio)



-
Improving Google Cloud Speech-to-Text accuracy
6 juillet 2020, par lr_optimI'm working on a project where I need to perform these steps :


- 

- Record a voice call (
.webm
-file) - Split the
webm
-file into chunks withffmpeg
and convert the file intowav
- Transcribe the chunks using
SpeechRecognition
-library and Google Cloud API








I've faced problems with the transcription accuracy and wondering if there is something I could do to improve it. At the time I'm splitting the original file into 30s chunks. I thought there might be one problem, that I might be missing words because of splitting so I've tried also with longer chunks under 60s but didn't notice any improve in accuracy.
Reading trough the speechRecognition docs I decided to set
r.energy_threshold = 4000
, I also tried to set theenergy_treshold
dynamically like this :

with sr.AudioFile(name) as source:
 r.dynamic_energy_threshold = True
 r.adjust_for_ambient_noise(source, duration = 1)
 audio = r.record(source)



I've also tested
en-US
anden-GB
to see if there's some difference but there isn't as much as I'd want. The program is supposed to work with english language spoken by nordic people. If someone has experience about choosing a right language model for people speaking with accent, please let me know.

This is the
ffmpeg
command is use to split the webm file into chunks :command = ['ffmpeg', '-i', filename, '-f', 'segment', '-segment_time', '30', parts_dir + outputname + '%09d.wav']


Is there somethig I could do better ? I'm wondering if the quality is not good enough an Google is having hard time because of that ?


The main problem is I'm getting bad results (lots of wrong words) from Google and wondering if there is something I could do about it.


- Record a voice call (
-
Send MP3 audio extracted from m3u8 stream to IBM Watson Speech To Text
20 novembre 2018, par Link69I’m extracting audio in MP3 format from a M3U8 live url and the final goal is to send the live audio stream to IBM Watson Speech To Text. The m3u8 is obtained by calling an external script via a Process. Then I use FFMPEG script to get the audio in stdout. It works if I save the audio in a file but I don’t want to save the extracted audio, I need to send the datas directly to the STT service. So far I proceeded like this :
SpeechToTextService speechToTextService = new SpeechToTextService(sttUsername, sttPassword);
string m3u8Url = "https://something.m3u8";
char[] buffer = new char[48000];
Process ffmpeg = new ProcessHelper(@"ffmpeg\ffmpeg.exe", $"-v 0 -i {m3u8Url} -acodec mp3 -ac 2 -ar 48000 -f mp3 -");
ffmpeg.Start();
int count;
while ((count = ffmpeg.StandardOutput.Read(buffer, 0, 48000)) > 0)
{
ffmpeg.StandardOutput.Read(buffer, 0, 48000);
var answer = speechToTextService.RecognizeSessionless(
audio: buffer.Select(c => (byte)c).ToArray(),
contentType: "audio/mpeg",
smartFormatting: true,
speakerLabels: false,
model: "en-US_BroadbandModel"
);
// Get answer.ResponseJson, deserializing, clean buffer, etc...
}When requesting the transcribed audio I’m getting this error :
An unhandled exception of type 'System.AggregateException' occurred in IBM.WatsonDeveloperCloud.SpeechToText.v1.dll: 'One or more errors occurred. (The API query failed with status code BadRequest: Bad Request | x-global-transaction-id: bd6cd203720a70d83b9a03451fe28973 | X-DP-Watson-Tran-ID: bd6cd203720a70d83b9a03451fe28973)'
Inner exceptions found, see $exception in variables window for more details.
Innermost exception IBM.WatsonDeveloperCloud.Http.Exceptions.ServiceResponseException : The API query failed with status code BadRequest: Bad Request | x-global-transaction-id: bd6cd203720a70d83b9a03451fe28973 | X-DP-Watson-Tran-ID: bd6cd203720a70d83b9a03451fe28973
at IBM.WatsonDeveloperCloud.Http.Filters.ErrorFilter.OnResponse(IResponse response, HttpResponseMessage responseMessage)
at IBM.WatsonDeveloperCloud.Http.Request.<getresponse>d__30.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at IBM.WatsonDeveloperCloud.Http.Request.<asmessage>d__23.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at IBM.WatsonDeveloperCloud.Http.Request.<as>d__24`1.MoveNext()
</as></asmessage></getresponse>ProcessHelper is just for convenience :
class ProcessHelper : Process
{
private string command;
private string arguments;
public ProcessHelper(string command, string arguments, bool redirectStandardOutput = true)
{
this.command = command;
this.arguments = arguments;
StartInfo = new ProcessStartInfo()
{
FileName = this.command,
Arguments = this.arguments,
UseShellExecute = false,
RedirectStandardOutput = redirectStandardOutput,
CreateNoWindow = true
};
}
}Pretty sure I’m doing it wrong, I’d love someone to shine a light on this. Thanks.