Wait A minute! What Did You Say?
It Is Easy to Wreck A Nice Beach
It Is Easy to Recognise Speech
We have all seen voice digital assistants released on public devices: Alexa, Siri, Cortana — you got it, name a word ending with an ‘a’ or an ‘i’! This does not mean voice recognition is easy. It isn’t. Accuracy in speech recognition remains a challenge even though breakthrough improvement have been introduced with machine learning.
Digital Voice Assistant To the Rescue
Google, Amazon (Alexa), and Apple (Siri) use their own API in the cloud to recognise speech and then process your request.
Timeline, from most recent:
- On August 31, 2017, Amazon and Microsoft team up their AI voice assistants with new partnership deal.
- On August 20, 2017, Microsoft announced that its conversational speech recognition system has reached a 5.1% error rate.
- On December 19 2014, researchers in Baidu announce they’ve reached a 6.56% error rate and publish Deep Speech: Scaling up end-to-end speech recognition.
- Before December 2914, state-of-the-art techniques could only achieve a 16% error rate!
You’re probably thinking:
I like accuracy, but isn’t it possible to run this operation locally on a device and achieve the same performance level?
The error rate, or WER (Word Error Rate), is measured in the number of words that are wrongly converted from audio to text. The web giants are using their huge data set in order to train voice recognition models using real-life data. This data is often free to access: consider TEDx, for example, with 1,700 video recordings, or the audio streams you can find in YouTube content.
Training requires TB of data and is computationally intensive. Warning: Don’t Try This at Home: your room will heat up quickly! A nicer, and more energy-efficient approach, is to run this operation in a properly-cooled data center. You will save money by using pay-as-you-go hardware resources billed by the hour, and released when not in use.
On the other hand, inference uses a trained model with new data. With its own memory size, and processor constraints, smartphones and other embedded devices can run inference locally if they can load a machine learning model and have the right software support. With Apple’s Core-ML release, this capability is now accessible to every iOS developer.
Add 3 Digital Voice Functions in 2020
The digital voice assistant field is moving fast, with more integration in apps and devices as we have seen with Microsoft and Alexa.
The applications foreseen with voice processing capabilities include:
- Searching for moments. As our video library size is increasing, can we search efficiently in our library based on a keyword, or a sentence, or a sentiment?
- On the B2B side, voice and AI can introduce significant changes in customer relationships. Salesforce is running a demo to include sentiment analysis in their Einstein solution.
- Getting access to customer complaint descriptions, and searching efficiently in this knowledge base is a manual and error-prone process, driving operation costs up. What would you change in your organisation if you could have search and AI capabilities on the feedback given by your customers?
The voice of the customer is important to capture client expectations, preferences and aversions. Your operations teams can benefit from advanced voice and AI features.
You can add a Sixth Sense in your VoIP services, learn more: ivoip.io