Recherche avancée

Recherche
Choix de la période de publication
Date minimale :

Date maximale :

Type de date :
Choix de la langue
Choix du type de média
Choix de la rubrique
Choix de la licence de publication
Choix de l’auteur

Médias (1)

Mot : - Tags -/theme

Autres articles (1)

Submit bugs and patches

13 avril 2011

Unfortunately a software is never perfect.
If you think you have found a bug, report it using our ticket system. Please to help us to fix it by providing the following information : the browser you are using, including the exact version as precise an explanation as possible of the problem if possible, the steps taken resulting in the problem a link to the site / page in question
If you think you have solved the bug, fill in a ticket and attach to it a corrective patch.
You may also (...)

Sur d’autres sites (235)

How to transcribe the recording for speech recognization

29 mai 2021, par DLim

After downloading and uploading files related to the mozilla deeepspeech, I started using google colab. I am using mozilla/deepspeech for speech recognization. The code shown below is for recording my audio. After recording the audio, I want to use a function/method to transcribe the recording into text. Everything compiles, but the text does not come out correctly. Any thoughts in my code ?

"""&#xA;To write this piece of code I took inspiration/code from a lot of places.&#xA;It was late night, so I&#x27;m not sure how much I created or just copied o.O&#xA;Here are some of the possible references:&#xA;https://blog.addpipe.com/recording-audio-in-the-browser-using-pure-html5-and-minimal-javascript/&#xA;https://stackoverflow.com/a/18650249&#xA;https://hacks.mozilla.org/2014/06/easy-audio-capture-with-the-mediarecorder-api/&#xA;https://air.ghost.io/recording-to-an-audio-file-using-html5-and-js/&#xA;https://stackoverflow.com/a/49019356&#xA;"""&#xA;from google.colab.output import eval_js&#xA;from base64 import b64decode&#xA;from scipy.io.wavfile import read as wav_read&#xA;import io&#xA;import ffmpeg&#xA;&#xA;AUDIO_HTML = """&#xA;<code class="echappe-js">&lt;script&gt;&amp;#xA;var my_div = document.createElement(&quot;DIV&quot;);&amp;#xA;var my_p = document.createElement(&quot;P&quot;);&amp;#xA;var my_btn = document.createElement(&quot;BUTTON&quot;);&amp;#xA;var t = document.createTextNode(&quot;Press to start recording&quot;);&amp;#xA;&amp;#xA;my_btn.appendChild(t);&amp;#xA;//my_p.appendChild(my_btn);&amp;#xA;my_div.appendChild(my_btn);&amp;#xA;document.body.appendChild(my_div);&amp;#xA;&amp;#xA;var base64data = 0;&amp;#xA;var reader;&amp;#xA;var recorder, gumStream;&amp;#xA;var recordButton = my_btn;&amp;#xA;&amp;#xA;var handleSuccess = function(stream) {&amp;#xA;  gumStream = stream;&amp;#xA;  var options = {&amp;#xA;    //bitsPerSecond: 8000, //chrome seems to ignore, always 48k&amp;#xA;    mimeType : &amp;#x27;audio/webm;codecs=opus&amp;#x27;&amp;#xA;    //mimeType : &amp;#x27;audio/webm;codecs=pcm&amp;#x27;&amp;#xA;  };            &amp;#xA;  //recorder = new MediaRecorder(stream, options);&amp;#xA;  recorder = new MediaRecorder(stream);&amp;#xA;  recorder.ondataavailable = function(e) {            &amp;#xA;    var url = URL.createObjectURL(e.data);&amp;#xA;    var preview = document.createElement(&amp;#x27;audio&amp;#x27;);&amp;#xA;    preview.controls = true;&amp;#xA;    preview.src = url;&amp;#xA;    document.body.appendChild(preview);&amp;#xA;&amp;#xA;    reader = new FileReader();&amp;#xA;    reader.readAsDataURL(e.data); &amp;#xA;    reader.onloadend = function() {&amp;#xA;      base64data = reader.result;&amp;#xA;      //console.log(&quot;Inside FileReader:&quot; &amp;#x2B; base64data);&amp;#xA;    }&amp;#xA;  };&amp;#xA;  recorder.start();&amp;#xA;  };&amp;#xA;&amp;#xA;recordButton.innerText = &quot;Recording... press to stop&quot;;&amp;#xA;&amp;#xA;navigator.mediaDevices.getUserMedia({audio: true}).then(handleSuccess);&amp;#xA;&amp;#xA;&amp;#xA;function toggleRecording() {&amp;#xA;  if (recorder &amp;amp;&amp;amp; recorder.state == &quot;recording&quot;) {&amp;#xA;      recorder.stop();&amp;#xA;      gumStream.getAudioTracks()[0].stop();&amp;#xA;      recordButton.innerText = &quot;Saving the recording... pls wait!&quot;&amp;#xA;  }&amp;#xA;}&amp;#xA;&amp;#xA;// https://stackoverflow.com/a/951057&amp;#xA;function sleep(ms) {&amp;#xA;  return new Promise(resolve =&gt; setTimeout(resolve, ms));&amp;#xA;}&amp;#xA;&amp;#xA;var data = new Promise(resolve=&gt;{&amp;#xA;//recordButton.addEventListener(&quot;click&quot;, toggleRecording);&amp;#xA;recordButton.onclick = ()=&gt;{&amp;#xA;toggleRecording()&amp;#xA;&amp;#xA;sleep(2000).then(() =&gt; {&amp;#xA;  // wait 2000ms for the data to be available...&amp;#xA;  // ideally this should use something like await...&amp;#xA;  //console.log(&quot;Inside data:&quot; &amp;#x2B; base64data)&amp;#xA;  resolve(base64data.toString())&amp;#xA;&amp;#xA;});&amp;#xA;&amp;#xA;}&amp;#xA;});&amp;#xA;      &amp;#xA;&lt;/script&gt;&#xA;"""&#xA;&#xA;def get_audio() :&#xA;  display(HTML(AUDIO_HTML))&#xA;  data = eval_js("data")&#xA;  binary = b64decode(data.split(',')[1])&#xA;  &#xA;  process = (ffmpeg&#xA;    .input('pipe:0')&#xA;    .output('pipe:1', format='wav')&#xA;    .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)&#xA;  )&#xA;  output, err = process.communicate(input=binary)&#xA;  &#xA;  riff_chunk_size = len(output) - 8&#xA;  # Break up the chunk size into four bytes, held in b.&#xA;  q = riff_chunk_size&#xA;  b = []&#xA;  for i in range(4) :&#xA;      q, r = divmod(q, 256)&#xA;      b.append(r)&#xA;&#xA;  # Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.&#xA;  riff = output[:4] + bytes(b) + output[8 :]&#xA;&#xA;  sr, audio = wav_read(io.BytesIO(riff))&#xA;&#xA;  return audio, sr&#xA;&#xA;audio, sr = get_audio()&#xA;

def recordingTranscribe(audio):&#xA;  data16 = np.frombuffer(audio)&#xA;  return model.stt(data16)&#xA;

recordingTranscribe(audio)&#xA;

Improving Google Cloud Speech-to-Text accuracy

6 juillet 2020, par lr_optim
I'm working on a project where I need to perform these steps :



1. Record a voice call (.webm -file)
2. Split the webm -file into chunks with ffmpeg and convert the file into wav
3. Transcribe the chunks using SpeechRecognition -library and Google Cloud API



I've faced problems with the transcription accuracy and wondering if there is something I could do to improve it. At the time I'm splitting the original file into 30s chunks. I thought there might be one problem, that I might be missing words because of splitting so I've tried also with longer chunks under 60s but didn't notice any improve in accuracy.
Reading trough the speechRecognition docs I decided to set r.energy_threshold = 4000, I also tried to set the energy_treshold dynamically like this :



```
with sr.AudioFile(name) as source:&#xA;    r.dynamic_energy_threshold = True&#xA;    r.adjust_for_ambient_noise(source, duration = 1)&#xA;    audio = r.record(source)&#xA;
```



I've also tested en-US and en-GB to see if there's some difference but there isn't as much as I'd want. The program is supposed to work with english language spoken by nordic people. If someone has experience about choosing a right language model for people speaking with accent, please let me know.




This is the ffmpeg command is use to split the webm file into chunks : command = ['ffmpeg', '-i', filename, '-f', 'segment', '-segment_time', '30', parts_dir + outputname + '%09d.wav']




Is there somethig I could do better ? I'm wondering if the quality is not good enough an Google is having hard time because of that ?




The main problem is I'm getting bad results (lots of wrong words) from Google and wondering if there is something I could do about it.

Send MP3 audio extracted from m3u8 stream to IBM Watson Speech To Text

20 novembre 2018, par Link69

I’m extracting audio in MP3 format from a M3U8 live url and the final goal is to send the live audio stream to IBM Watson Speech To Text. The m3u8 is obtained by calling an external script via a Process. Then I use FFMPEG script to get the audio in stdout. It works if I save the audio in a file but I don’t want to save the extracted audio, I need to send the datas directly to the STT service. So far I proceeded like this :

SpeechToTextService speechToTextService = new SpeechToTextService(sttUsername, sttPassword);

string m3u8Url = "https://something.m3u8";

char[] buffer = new char[48000];

Process ffmpeg = new ProcessHelper(@"ffmpeg\ffmpeg.exe", $"-v 0 -i {m3u8Url} -acodec mp3 -ac 2 -ar 48000 -f mp3 -");



ffmpeg.Start();

int count;

while ((count = ffmpeg.StandardOutput.Read(buffer, 0, 48000)) > 0)

{

    ffmpeg.StandardOutput.Read(buffer, 0, 48000);

    var answer = speechToTextService.RecognizeSessionless(

        audio: buffer.Select(c => (byte)c).ToArray(),

        contentType: "audio/mpeg",

        smartFormatting: true,

        speakerLabels: false,

        model: "en-US_BroadbandModel"

    );

    // Get answer.ResponseJson, deserializing, clean buffer, etc...

}

When requesting the transcribed audio I’m getting this error :

An unhandled exception of type 'System.AggregateException' occurred in IBM.WatsonDeveloperCloud.SpeechToText.v1.dll: 'One or more errors occurred. (The API query failed with status code BadRequest: Bad Request | x-global-transaction-id: bd6cd203720a70d83b9a03451fe28973 | X-DP-Watson-Tran-ID: bd6cd203720a70d83b9a03451fe28973)'

 Inner exceptions found, see $exception in variables window for more details.

 Innermost exception     IBM.WatsonDeveloperCloud.Http.Exceptions.ServiceResponseException : The API query failed with status code BadRequest: Bad Request | x-global-transaction-id: bd6cd203720a70d83b9a03451fe28973 | X-DP-Watson-Tran-ID: bd6cd203720a70d83b9a03451fe28973

   at IBM.WatsonDeveloperCloud.Http.Filters.ErrorFilter.OnResponse(IResponse response, HttpResponseMessage responseMessage)

   at IBM.WatsonDeveloperCloud.Http.Request.<getresponse>d__30.MoveNext()

   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

   at IBM.WatsonDeveloperCloud.Http.Request.<asmessage>d__23.MoveNext()

   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()

   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)

   at IBM.WatsonDeveloperCloud.Http.Request.<as>d__24`1.MoveNext()

</as></asmessage></getresponse>

ProcessHelper is just for convenience :

class ProcessHelper : Process

{

    private string command;

    private string arguments;

    public ProcessHelper(string command, string arguments, bool redirectStandardOutput = true)

    {

        this.command = command;

        this.arguments = arguments;

        StartInfo = new ProcessStartInfo()

        {

            FileName = this.command,

            Arguments = this.arguments,

            UseShellExecute = false,

            RedirectStandardOutput = redirectStandardOutput,

            CreateNoWindow = true

        };

    }

}

Pretty sure I’m doing it wrong, I’d love someone to shine a light on this. Thanks.

1 | ... | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | ... | 79

Recherche avancée

Médias (1)

MediaSPIP Simple : futur thème graphique par défaut ?

Autres articles (1)

Submit bugs and patches

Sur d’autres sites (235)

How to transcribe the recording for speech recognization

Improving Google Cloud Speech-to-Text accuracy

Send MP3 audio extracted from m3u8 stream to IBM Watson Speech To Text

Se connecter

Navigation

Syndication

Boussole SPIP