Recherche avancée
Médias (1)
-
Collections - Formulaire de création rapide
19 février 2013, par
Mis à jour : Février 2013
Langue : français
Type : Image
Sur d’autres sites (233)
-
using speech diarization results in speech recognition API
10 septembre 2021, par FIREI'm trying to understand more about speech diarization and speech recognition . I started following this tutorial and I was able to get tubles of the audio labeling .


According to the tutorial you can use google speech API and send the audio segments to googles API and it will get transcribed and That is exactly where I'm stuck at !


According to the tutorial All you have to do is


- 

- Get a Google /Ibm watson API speech to text (done)




(I have done this step and got Watson API key and url !)


1.For each tuple element ‘ele’ in your labelling file, extract ele[0] as the speaker label, ele1 as the start time and ele[2] as the end time.


(I didn't understand this step at all ... I tried this , but I'm not quit sure if this is what they mean)



for ele in labelling:
 speaker_label = ele[0]
 start_time = ele[1]
 end_time=ele[2]




2.Trim your original audio file from start time to end time. You can use ffmpeg for this task.


(This step depends on step 1 ,but I also don't understand it as I have no idea how to use ffmpeg or how to utilize it for this project)


3.Pass the trimmed audio file obtained in the previous step to Google’s API/ Ibm watson API which will return you the text transcript of this audio segment.


(I just need to understand the context or the code of how to pass the segmented audio and how it will look like)


4.Write the transcript along with the speaker label to a text file and save it.


Any help would be more than appreciated !


My Full code :


from resemblyzer import preprocess_wav, VoiceEncoder
from pathlib import Path

from resemblyzer.audio import sampling_rate

from spectralcluster import SpectralClusterer

import ffmpeg

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

# Ibm related components (Not used as it's not implemented )
authenticator = IAMAuthenticator('Key here')
speech_to_text = SpeechToTextV1(
 authenticator=authenticator
)


speech_to_text.set_service_url(
 'URL HERE')

#-------------------------------------------------------

#From the tutorial this part is to get the audio file and to process it 

# give the file path to your audio file
audio_file_path = 'Audio files/testForTheOthers.wav'
wav_fpath = Path(audio_file_path)

wav = preprocess_wav(wav_fpath)
encoder = VoiceEncoder("cpu")
_, cont_embeds, wav_splits = encoder.embed_utterance(wav, return_partials=True, rate=16)
print(cont_embeds.shape)



#-----------------------------------------------------------------------


#From the tutorial this is the clustering part
#(some parts of the code got me error that is why they are not included)
# (p_percentile=0.90,gaussian_blur_sigma=1) got removed (Errors)

clusterer = SpectralClusterer(
 min_clusters=2,
 max_clusters=100,
)

labels = clusterer.predict(cont_embeds)
#-----------------------------------------------------------------------



#From the tutorial this is the clustering part


def create_labelling(labels, wav_splits):
 from resemblyzer.audio import sampling_rate
 times = [((s.start + s.stop) / 2) / sampling_rate for s in wav_splits]
 labelling = []
 start_time = 0

 for i, time in enumerate(times):
 if i > 0 and labels[i] != labels[i - 1]:
 temp = [str(labels[i - 1]), start_time, time]
 labelling.append(tuple(temp))
 start_time = time
 if i == len(times) - 1:
 temp = [str(labels[i]), start_time, time]
 labelling.append(tuple(temp))

 return labelling


labelling = create_labelling(labels, wav_splits)


print(labelling)
#----------------------

#Me Trying to implement step 1

for ele in labelling:
 speaker_label = ele[0]
 start_time = ele[1]
 end_time=ele[2]


#-----------------------------------------------------------------------------

#After this part you are supposed to implement the rest of the tutorial 
#but I'm stuck





-
Permission denied/ffmpeg error with Speech Recognition library
7 juin 2023, par sally carlundI'm trying to use the speech recognition library to set up a voice assistant, but I keep receiving a permission denied error. I suppose that the error is that the file isn't saved at any location where Python can access it, but I'm not entirely sure how to change the directory where the file is stored using this library.


Code :


import speech_recognition as sr

r = sr.Recognizer()

with sr.Microphone() as source:
 print("Say something!")
 try:
 audio = r.listen(source, timeout=5) 
 except sr.WaitTimeoutError:
 print("You didn't say anything")

 print("Did you say " + r.recognize_whisper(audio))



Traceback :


Traceback (most recent call last):
 File "C:\Users\s\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\whisper\audio.py", line 42, in load_audio
 ffmpeg.input(file, threads=0)
 File "C:\Users\s\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0\LocalCache\local-packages\Python310\site-packages\ffmpeg\_run.py", line 325, in run
 raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
 File "c:\Users\s\projects\FinalProject\debugging.py", line 12, in <module>
 print("Did you say " + r.recognize_whisper(audio))
 libswresample 4. 9.100 / 4. 9.100 libpostproc 56. 7.100 / 56. 7.100
C:\Users\s\AppData\Local\Temp\tmpt5niak28.wav: Permission denied
</module>


I've used this library before successfully, so not sure what the issue is. Any suggestions ?


-
Speech recognition with python-telegram-bot without downloading an audio file
25 juin 2022, par linzI'm developing a telegram bot in which the user sends a voice message, the bot transcribes it and sends back what was said in text.
For that I am using the python-telegram-bot library and the speech_recognition library with the google engine.
My problem is, the voice messages sent by the users are .mp3, however in order to transcribe them i need to convert them to .wav. In order to do that I have to download the file sent to the bot.
Is there a way to avoid that ? I understand this is not an efficient and a safe way to do this since many active users at once will result in race conditions and takes a lot of space.



def voice_handler(update, context):
 bot = context.bot
 file = bot.getFile(update.message.voice.file_id)
 file.download('voice.mp3')
 filename = "voice.wav"
 
 # convert mp3 to wav file
 subprocess.call(['ffmpeg', '-i', 'voice.mp3',
 'voice.wav', '-y'])

 # initialize the recognizer
 r = sr.Recognizer()
 
 # open the file
 with sr.AudioFile(filename) as source:
 
 # listen for the data (load audio to memory)
 audio_data = r.record(source)
 # recognize (convert from speech to text)
 text = r.recognize_google(audio_data, language='ar-AR')
 
 
def main() -> None:
 updater.dispatcher.add_handler(MessageHandler(Filters.voice, voice_handler))