“Siri, text mom.”
“Alexa, play ‘Flowers’ by Miley Cyrus.”
Voice commands are convenient — unless you’re at a deafening concert, in a quiet library, or you’re unable to use your voice. New frames for eyeglasses that read the wearer’s lips now offer a solution.
Lip-reading involves tracking facial movements to determine what someone is saying. Many lip-reading devices point a camera at the user’s face. Others rely on sensors stuck in or around the speaker’s mouth. Neither approach is suitable for daily use, says Ruidong Zhang. He studies information science at Cornell University in Ithaca, N.Y.
His team built the new lip-reading tech on a pair of eyeglasses. It uses acoustics — sound — to recognize silent speech. Zhang presented this work April 19 at the ACM Conference on Human Factors in Computing Systems in Hamburg, Germany.
Today, voice commands aren’t private, says Pattie Maes. She’s an expert in human-computer interactions and artificial intelligence (AI). She works at the Massachusetts Institute of Technology in Cambridge. Developing “silent, hands-free and eyes-free approaches” could make digital interactions more accessible while keeping them confidential, she says.
Maes wasn’t involved in the new work, but she has developed other types of silent speech interfaces. She’s eager to see how this one compares in areas such as usability, privacy and accuracy. “I am excited to see this novel acoustic approach,” she says.
Hearing silent speech
“Imagine the sonar system that whales or submarines use,” says Zhang. They send a sound into their environment and listen for echoes. From those echoes, they locate objects in their surroundings.
“Our approach is similar, but not exactly the same,” Zhang explains. “We’re not just interested in locating something. Instead, we’re trying to track subtle moving patterns.”
Zhang calls the new tech EchoSpeech. It consists of two small speakers under one lens of a pair of glasses, two small microphones under the other lens, and a circuit board attached to one of the side arms.
When EchoSpeech is switched on, its speakers play high-pitched sounds. People can’t hear these. But the sound waves still reverberate in every direction. Some travel around the user’s lips and mouth. While speaking, the user’s facial movements change the paths of those sound waves. That, in turn, changes the echo patterns picked up by the microphones.
These patterns are sent to the wearer’s smartphone over Bluetooth. Using AI, an EchoSpeech app then unravels the echo patterns. It matches each pattern to commands that the smartphone then follows.
To test this tech, 24 people took turns wearing the glasses. They gave silent commands while sitting or walking. EchoSpeech performed well in both cases, even with loud background noises. Overall, it was about 95 percent accurate.
The prototype cost less than $100 to build, and Zhang says frames could likely be engineered to hide the electronics in future versions. Need prescription lenses? No problem. Just pop them into the EchoSpeech frames.
Enhancing personal communication
EchoSpeech currently recognizes 31 voice commands, from “play” to “hey, Siri.” It also recognizes numbers that are three to six digits long. But those aren’t limits, Zhang says. He thinks future versions could recognize a much larger vocabulary. “If people can learn to read lips efficiently, then so can AI,” he says.
If so, users could write personal text messages via silent speech. In a noisy restaurant, they could use that approach to send messages to friends who are hard of hearing or far away, instead of trying to yell over the noise or type their words. And those who have lost their voices could participate in conversations face-to-face. Their facial movements could be interpreted in real time and their words texted to their friends’ smartphones.
Educators and Parents, Sign Up for The Cheat Sheet
Weekly updates to help you use Science News Explores in the learning environment
Thank you for signing up!
There was a problem signing you up.
EchoSpeech was designed to interpret silent speech, but it might also help recreate voices. People who’ve had their vocal cords removed have been contacting Zhang’s team. They want to know if this interface could read their lips and then speak out loud for them.
He’s now exploring whether EchoSpeech could do this in a person’s own voice. Echo patterns for the same word are slightly different among speakers. The differences could reflect the specific vocal qualities of the speaker, if they can be untangled.
People without voices often use text-to-voice programs that sound robotic. The message “doesn’t have your emotion, doesn’t have your tone, doesn’t have your speech style,” Zhang notes. Right now, he says, “We’re trying to maintain that information to get an actual living voice.”
This is one in a series presenting news on technology and innovation, made possible with generous support from the Lemelson Foundation.