Voice assistants like Siri or Alexa have already learned how to reliably recognize our speech. But technology is not standing still - the latest research presented at the International Conference on Acoustics and Speech Processing in Shanghai suggests that lip-reading machines will appear in the not too distant future.
Immediately after the announcement of the study, the expert community split into two camps. Some saw this as a frightening prospect of another invasion of privacy. Their opponents, on the contrary, did not see malicious intent in the new technology and suggested using it, for example, to improve film dubbing.
In fact, teaching a machine to understand human speech by lips is a very difficult task. The fact is that in the course of a conversation, a person reproduces with his lips only 14 mimic expressions, which accompany about 50 different sounds. This means that, for example, the sounds "p" and "b" from the side "look" the same, but are pronounced differently.
A group of researchers at the University of East Anglia led by Helen Beer has developed a new algorithm that will help machines distinguish these sounds. For this, video and audio recordings of 12 people were used, uttering 200 sentences. Scientists have taught the computer to select several sounds that correspond to certain facial expressions of the mouth.
Then, with the help of training, the program learned to distinguish between similar words with different first sounds and to determine the desired word by context. While the accuracy of the algorithm is still far from ideal - it recognizes only 25% of speech without errors. However, this is much more efficient than existing designs.