01:54
I'm setting up a Twilio Voice call with real-time media streaming to a WebSocket server for speech-to-text processing using Google Cloud Speech-to-Text. The connection is established successfully, and I receive a continuous stream of audio data from Twilio. However, when I play back the received audio, all I hear is a rapid clicking/jackhammering noise instead of the actual speech spoken during the call.
Setup:
Twilio sends inbound audio to my WebSocket server.
WebSocket receives and saves the raw mulaw-encoded audio data from Twilio.
The audio is processed via (...)