In the era of voice-enabled devices like Google Assistant, Amazon Alexa it’s quite obvious that In the near future, there will be more or less support of voice-enabled services in every aspect of our routine life.
As it provides better interactivity and easy accessibility, it’ll be a game-changer for the next generation. There are already smart houses out there where every single thing in your house can talk to you and respond to your command. There is neither GUI nor content needed in voice-enabled devices, the only concerning factor is speed. You can get a faster Response compared to all other technologies.
There are so many libraries and API out there, you can use to get started with your voice bot like
- Microsoft Bing Speech
- Google Web Speech API
- Google Cloud speech
- IBM speech to text
- Wit.ai
We are going to use Google web Speech API from speechRecognition library. It’s easy to use as it has a default API key that is hard-coded into the SpeechRecognition library.
So that you can get started using it without any configuration and authentication process. Of course like every other API it has a daily limit of 50 requests. And we can’t raise the limit by any chance. so this is the best API you can use for experiment purposes. For production or live scenarios, you’ll have to purchase paid services from the above-mentioned APIs.
There will be a three-step process for every voice-enabled device –
- Speech to Text: In this phase, we are going to let our bot understand what we are talking about. We’ll provide either an audio file or a direct stream from our mic. The bot will convert this sound signal into text using our google speech recognition API.
- Processing: After converting your voice into a text bot will process your text and respond the same as a text-based bot will do. The process can be either to search a song from the web or can be to set an alarm or reminder.
- Text to speech: After the bot completed its processing and ready with your output stream or data, the last step is to give the user that processed response in voice form, which can be achieved using the google TextToSpeech library.
So, Let’s get started with developing your first voice-enabled bot.
Dependencies :
- Google Speech recognition library
pip install SpeechRecognition
- Pyaudio
pip install pyaudio
- Flask
pip install Flask
Script.py import json import os from flask import Flask, Response from flask import jsonify from flask import request, redirect from flask_socketio import SocketIO from flask_cors import CORS import ss import speech_recognition as sr import io from gtts import gTTS app = Flask(__name__) socketio = SocketIO(app) CORS(app) # Redirect http to https on CloudFoundry @app.before_request def before_request(): fwd = request.headers.get('x-forwarded-proto') if fwd is None: return None elif fwd == "https": return None elif fwd == "http": url = request.url.replace('http://', 'https://', 1) code = 301 return redirect(url, code=code) @app.route('/') def Welcome(): return app.send_static_file('index.html') @app.route('/api/conversation', methods=['POST', 'GET']) def getConvResponse(): convText = request.form.get('convText') convContext = request.form.get('context', "{}") jsonContext = json.loads(convContext) if convText: response = "Did you mean, " + convText + " ?" else: response = "Hello There" responseDetails = {'responseText':response, 'context':response} return jsonify(results=responseDetails) @app.route('/api/text-to-speech', methods=['POST']) def getSpeechFromText(): inputText = request.form.get('text') def generate(): if inputText: audioOut = gTTS(text=inputText, lang='en', slow=False) kk = audioOut.save("welcome.mp3") f = open("welcome.mp3",'rb') data = f.read() else: print("Empty response") data = "I have no response to that." yield data return Response(response=generate(), mimetype="audio/x-wav") @app.route('/api/speech-to-text', methods=['POST']) def getTextFromSpeech(): recognizer = sr.Recognizer() f = request.files['audio_data'] print(f,type(f)) file_obj = io.BytesIO() file_obj.write(f.read()) file_obj.seek(0) mic = sr.AudioFile(file_obj) response = ss.recognize_speech_from_mic(recognizer, mic) print('\nSuccess : {}\nError : {}\n\nText from Speech\n{}\n\n{}' \ .format(response['success'], response['error'], '-'*17, response['transcription'])) return Response(response=response['transcription'], mimetype='plain/text') port = 5000 if __name__ == "__main__": socketio.run(app, host='0.0.0.0', port=int(port)) Ss.py import speech_recognition as sr def recognize_speech_from_mic(recognizer, microphone): with microphone as source: audio = recognizer.record(source) response = { "success": True, "error": None, "transcription": None } try: response["transcription"] = recognizer.recognize_google(audio) except sr.RequestError: # API was unreachable or unresponsive response["success"] = False response["error"] = "API unavailable/unresponsive" except sr.UnknownValueError: # speech was unintelligible response["error"] = "Unable to recognize speech" return response
Run script.py file and it’ll run your server on 5000 ports. you‘ll need to call all defined functions from your front-end i.e HTML and javascript.
Let’s understand the code first.
- getConvResponse: This is the function that is responsible for storing the context of the conversation and printing output to your HTML front.
- getSpeechFromText: This function is responsible for converting your processed text output to voice output.
- getTextFromSpeech: This one is the most important function where we are getting voice input from the web recorder and converting it to text using speechRecognition API. This data will be passed to getConvResponse to save the context and process it.
Here in this tutorial, we developed a pretty simple example of a voice bot to make you understand how voice recognition works. You can use it in your live project by adding more functionalities. Feel free to contact us for any queries and know more about the other services we provide in Voice Assistant App development.
The post Speech Recognition & response in web or mobile directly without Alexa/Google home dependency appeared first on Lets Nurture - An IT Company Nurturing Ideas into Reality.