It Is Easy to Recognise Speech

Wait A minute! What Did You Say?

It Is Easy to Wreck A Nice Beach

Or

It Is Easy to Recognise Speech

We have all seen voice digital assistants released on public devices: Alexa, Siri, Cortana — you got it, name a word ending with an ‘a’ or an ‘i’! This does not mean voice recognition is easy. It isn’t. Accuracy in speech recognition remains a challenge even though breakthrough improvement have been introduced with machine learning.

Digital Voice Assistant To the Rescue

Google, Amazon (Alexa), and Apple (Siri) use their own API in the cloud to recognise speech and then process your request.

Timeline, from most recent:

You’re probably thinking:

I like accuracy, but isn’t it possible to run this operation locally on a device and achieve the same performance level?

The error rate, or WER (Word Error Rate), is measured in the number of words that are wrongly converted from audio to text. The web giants are using their huge data set in order to train voice recognition models using real-life data. This data is often free to access: consider TEDx, for example, with 1,700 video recordings, or the audio streams you can find in YouTube content.

Don’t Try This At Home

Training requires TB of data and is computationally intensive. Warning: Don’t Try This at Home: your room will heat up quickly! A nicer, and more energy-efficient approach, is to run this operation in a properly-cooled data center. You will save money by using pay-as-you-go hardware resources billed by the hour, and released when not in use.

On the other hand, inference uses a trained model with new data. With its own memory size, and processor constraints, smartphones and other embedded devices can run inference locally if they can load a machine learning model and have the right software support. With Apple’s Core-ML release, this capability is now accessible to every iOS developer.

Learn more: Read our previous article to learn more about training and inference

Add 3 Digital Voice Functions in 2020

The digital voice assistant field is moving fast, with more integration in apps and devices as we have seen with Microsoft and Alexa.

The applications foreseen with voice processing capabilities include:

  • Searching for moments. As our video library size is increasing, can we search efficiently in our library based on a keyword, or a sentence, or a sentiment?
  • On the B2B side, voice and AI can introduce significant changes in customer relationships. Salesforce is running a demo to include sentiment analysis in their Einstein solution.
  • Getting access to customer complaint descriptions, and searching efficiently in this knowledge base is a manual and error-prone process, driving operation costs up. What would you change in your organisation if you could have search and AI capabilities on the feedback given by your customers?

 

The voice of the customer is important to capture client expectations, preferences and aversions. Your operations teams can benefit from advanced voice and AI features.

You can add a Sixth Sense in your VoIP services, learn more: ivoip.io

 

About Sebastien Leger

Sébastien is a French Tech entrepreneur. After working 15 years in the telecommunications and computer science industry he decided to create his own venture in 2015. His vision is that machine learning and Big-Data can create new opportunities for the telecommunications and IoT industries. Sébastien is leading teams in France and Tokyo, Japan in order to bring accessible innovation and best in class technology to telecommunications service providers and IoT companies located anywhere in the world.
Comments are closed.