Recherche avancée

Médias (1)

Mot : - Tags -/theme

Autres articles (1)

  • Submit bugs and patches

    13 avril 2011

    Unfortunately a software is never perfect.
    If you think you have found a bug, report it using our ticket system. Please to help us to fix it by providing the following information : the browser you are using, including the exact version as precise an explanation as possible of the problem if possible, the steps taken resulting in the problem a link to the site / page in question
    If you think you have solved the bug, fill in a ticket and attach to it a corrective patch.
    You may also (...)

Sur d’autres sites (235)

  • What role does bit rate play in the accuracy of Google Speech To Text transcription ?

    11 novembre 2020, par Jash Shah

    I am helping a client convert a video file using ffmpeg and they originally used -b:a 64k while transcoding their video to audio at a sampling rate (-ar 44100 argument in ffmpeg) of 44100. Their objective is that they want to generate the most accurate transcriptions using the Google Cloud Speech To Text API.

    


    While combing through their documentation I did not find anything on how bit rate impacts the accuracy of the transcription. So my question is thus - would using a higher bit rate such as 128k help me in getting better transcriptions or does it not matter ?

    


  • Watson NarrowBand Speech to Text not accepting ogg file

    19 janvier 2017, par Bob Dill

    NodeJS app using ffmpeg to create ogg files from mp3 & mp4. If the source file is broadband, Watson Speech to Text accepts the file with no issues. If the source file is narrow band, Watson Speech to Text fails to read the ogg file. I’ve tested the output from ffmpeg and the narrowband ogg file has the same audio content (e.g. I can listen to it and hear the same people) as the mp3 file. Yes, in advance, I am changing the call to Watson to correctly specify the model and content_type. Code follows :

    exports.createTranscript = function(req, res, next)
    { var _name = getNameBase(req.body.movie);
     var _type = getType(req.body.movie);
     var _voice = (_type == "mp4") ? "en-US_BroadbandModel" : "en-US_NarrowbandModel" ;
     var _contentType = (_type == "mp4") ? "audio/ogg" : "audio/basic" ;
     var _audio = process.cwd()+"/HTML/movies/"+_name+'ogg';
     var transcriptFile = process.cwd()+"/HTML/movies/"+_name+'json';

     speech_to_text.createSession({model: _voice}, function(error, session) {
       if (error) {console.log('error:', error);}
       else
         {
           var params = { content_type: _contentType, continuous: true,
            audio: fs.createReadStream(_audio),
             session_id: session.session_id
             };
             speech_to_text.recognize(params, function(error, transcript) {
               if (error) {console.log('error:', error);}
               else
                 { fs.writeFile(transcriptFile, JSON.stringify(transcript), function(err) {if (err) {console.log(err);}});
                   res.send(transcript);
                 }
             });
         }
     });
    }

    _type is either mp3 (narrowband from phone recording) or mp4 (broadband)
    model: _voice has been traced to ensure correct setting
    content_type: _contentType has been traced to ensure correct setting

    Any ogg file submitted to Speech to Text with narrowband settings fails with Error: No speech detected for 30s. Tested with both real narrowband files and asking Watson to read a broadband ogg file (created from mp4) as narrowband. Same error message. What am I missing ?

  • Improving accuracy of Google Cloud Speech API

    17 août 2018, par Shaikat Haque

    I am currently recording audio from a web page on my Mac OS computer and running it through the cloud speech api to produce a transcript. However, the results aren’t that accurate and there are chunks of missing words in the results.

    Are there any steps that would help me yield more accurate results ?

    Here are the steps I am taking to convert audio to text :

    1. Use Soundflower to channel audio output from my soundcard to mic in.
    2. Play audio from website
    3. Use quickTime player to record audio which is saved as a .m4a file.
    4. Use the command line tool ffmpeg to convert the .m4a file to a
      .flac, and also combine 2 audio channels (stereo) to 1 audio channel (mono).
    5. Upload the .flac file to Google Cloud Storage. The file has a sample rate of 44100Hz and has 24 bits per sample.
    6. Use the longRunningRecognize api via the node.js client library,
      pointing to the file in Google cloud storage.