Recherche avancée

Médias (1)

Mot : - Tags -/MediaSPIP 0.2

Sur d’autres sites (233)

  • using speech diarization results in speech recognition API

    10 septembre 2021, par FIRE

    I'm trying to understand more about speech diarization and speech recognition . I started following this tutorial and I was able to get tubles of the audio labeling .

    


    According to the tutorial you can use google speech API and send the audio segments to googles API and it will get transcribed and That is exactly where I'm stuck at !

    


    According to the tutorial All you have to do is

    


      

    1. Get a Google /Ibm watson API speech to text (done)
    2. 


    


    (I have done this step and got Watson API key and url !)

    


    1.For each tuple element ‘ele’ in your labelling file, extract ele[0] as the speaker label, ele1 as the start time and ele[2] as the end time.

    


    (I didn't understand this step at all ... I tried this , but I'm not quit sure if this is what they mean)

    


    
for ele in labelling:
    speaker_label = ele[0]
    start_time = ele[1]
    end_time=ele[2]



    


    2.Trim your original audio file from start time to end time. You can use ffmpeg for this task.

    


    (This step depends on step 1 ,but I also don't understand it as I have no idea how to use ffmpeg or how to utilize it for this project)

    


    3.Pass the trimmed audio file obtained in the previous step to Google’s API/ Ibm watson API which will return you the text transcript of this audio segment.

    


    (I just need to understand the context or the code of how to pass the segmented audio and how it will look like)

    


    4.Write the transcript along with the speaker label to a text file and save it.

    


    Any help would be more than appreciated !

    


    My Full code :

    


    from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path

from resemblyzer.audio import sampling_rate

from spectralcluster import SpectralClusterer

import ffmpeg

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

# Ibm related components (Not used as it's not implemented )
authenticator = IAMAuthenticator('Key here')
speech_to_text = SpeechToTextV1(
    authenticator=authenticator
)


speech_to_text.set_service_url(
    'URL HERE')

#-------------------------------------------------------

#From the tutorial this part is to get the audio file and to process it 

# give the file path to your audio file
audio_file_path = 'Audio files/testForTheOthers.wav'
wav_fpath = Path(audio_file_path)

wav = preprocess_wav(wav_fpath)
encoder = VoiceEncoder("cpu")
_, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
print(cont_embeds.shape)



#-----------------------------------------------------------------------


#From the tutorial this is the clustering part
#(some parts of the code got me error that is why they are not included)
# (p_percentile=0.90,gaussian_blur_sigma=1) got removed (Errors)

clusterer = SpectralClusterer(
    min_clusters=2,
    max_clusters=100,
)

labels = clusterer.predict(cont_embeds)
#-----------------------------------------------------------------------



#From the tutorial this is the clustering part


def create_labelling(labels, wav_splits):
    from resemblyzer.audio import sampling_rate
    times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
    labelling = []
    start_time = 0

    for i, time in enumerate(times):
        if i > 0 and labels[i] != labels[i - 1]:
            temp = [str(labels[i - 1]), start_time, time]
            labelling.append(tuple(temp))
            start_time = time
        if i == len(times) - 1:
            temp = [str(labels[i]), start_time, time]
            labelling.append(tuple(temp))

    return labelling


labelling = create_labelling(labels, wav_splits)


print(labelling)
#----------------------

#Me Trying to implement step 1

for ele in labelling:
    speaker_label = ele[0]
    start_time = ele[1]
    end_time=ele[2]


#-----------------------------------------------------------------------------

#After this part you are supposed to implement the rest of the tutorial 
#but I'm stuck




    


  • Permission denied/ffmpeg error with Speech Recognition library

    7 juin 2023, par sally carlund

    I'm trying to use the speech recognition library to set up a voice assistant, but I keep receiving a permission denied error. I suppose that the error is that the file isn't saved at any location where Python can access it, but I'm not entirely sure how to change the directory where the file is stored using this library.

    


    Code :

    


    import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
            print("Say something!")
            try:
                audio = r.listen(source, timeout=5) 
            except sr.WaitTimeoutError:
                print("You didn't say anything")

            print("Did you say " + r.recognize_whisper(audio))


    


    Traceback :

    


    Traceback (most recent call last):&#xA;  File "C:\Users\s\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\whisper\audio.py", line 42, in load_audio&#xA;    ffmpeg.input(file, threads=0)&#xA;  File "C:\Users\s\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ffmpeg\_run.py", line 325, in run&#xA;    raise Error(&#x27;ffmpeg&#x27;, out, err)&#xA;ffmpeg._run.Error: ffmpeg error (see stderr output for detail)&#xA;&#xA;The above exception was the direct cause of the following exception:&#xA;&#xA;Traceback (most recent call last):&#xA;  File "c:\Users\s\projects\FinalProject\debugging.py", line 12, in <module>&#xA;    print("Did you say " &#x2B; r.recognize_whisper(audio))&#xA;  libswresample   4.  9.100 /  4.  9.100  libpostproc    56.  7.100 / 56.  7.100&#xA;C:\Users\s\AppData\Local\Temp\tmpt5niak28.wav: Permission denied&#xA;</module>

    &#xA;

    I've used this library before successfully, so not sure what the issue is. Any suggestions ?

    &#xA;

  • Speech recognition with python-telegram-bot without downloading an audio file

    25 juin 2022, par linz

    I'm developing a telegram bot in which the user sends a voice message, the bot transcribes it and sends back what was said in text.&#xA;For that I am using the python-telegram-bot library and the speech_recognition library with the google engine.&#xA;My problem is, the voice messages sent by the users are .mp3, however in order to transcribe them i need to convert them to .wav. In order to do that I have to download the file sent to the bot.&#xA;Is there a way to avoid that ? I understand this is not an efficient and a safe way to do this since many active users at once will result in race conditions and takes a lot of space.

    &#xA;

    &#xA;def voice_handler(update, context):&#xA;    bot = context.bot&#xA;    file = bot.getFile(update.message.voice.file_id)&#xA;    file.download(&#x27;voice.mp3&#x27;)&#xA;    filename = "voice.wav"&#xA;    &#xA;    # convert mp3 to wav file&#xA;    subprocess.call([&#x27;ffmpeg&#x27;, &#x27;-i&#x27;, &#x27;voice.mp3&#x27;,&#xA;                         &#x27;voice.wav&#x27;, &#x27;-y&#x27;])&#xA;&#xA;    # initialize the recognizer&#xA;    r = sr.Recognizer()&#xA;    &#xA;    # open the file&#xA;    with sr.AudioFile(filename) as source:&#xA;    &#xA;        # listen for the data (load audio to memory)&#xA;        audio_data = r.record(source)&#xA;        # recognize (convert from speech to text)&#xA;        text = r.recognize_google(audio_data, language=&#x27;ar-AR&#x27;)&#xA;        &#xA;        &#xA;def main() -> None:&#xA;    updater.dispatcher.add_handler(MessageHandler(Filters.voice, voice_handler)) &#xA;&#xA;

    &#xA;