The new development of the American IT giant Google looks like a dual-use technology. On the one hand, this is a godsend for a spy who can identify and eavesdrop on a speaker at a distance, even if he is hiding in a crowd of people. On the other hand, a breakthrough in the analysis of voice data will help numerous hearing impaired people and increase the efficiency of Google services itself. So how does it work?
It is not difficult to recognize a person's voice, even in the presence of interference - the problem is to identify its owner. Google developers simply attached a video camera to the microphone with an algorithm that responds to human facial expressions. The system compares the movements on the face of the speaker, “reads the lips” and simultaneously analyzes the sound. If the results match, great, the AI isolates this character and can only follow his speech against the background of a general cacophony of sounds.
The neural network was first taught the very technique of lip reading, then taught to distinguish people who are talking from simply laughing, to recognize facial expressions when talking, even if the face is partially hidden by a beard or microphone. Then a sorting mechanism was added to the system - when the speaker is calculated, his data is fed into a separate acoustic profile. Thanks to this, the AI can distinguish between the words of different people, even if they deliberately try to confuse it and speak or sing in unison.
Understanding the conversation of a specific person is a good thing not only for the spy. For example, it is possible with great accuracy to transmit to the hearing aid the words of the interlocutor of the disabled person, filtering out other voices, like noise. Or expand the functionality of video chats like Hangouts and Duo. Plus, these are new opportunities for voice control systems, and it will now be impossible to crack voice protection only with the help of a fake acoustic recording.