Talking through a tube can trick AI into mistaking one voice for another
Current voice-identification systems donât protect against this type of hack
Grab a paper towel tube and talk through it. Sounds weird, huh? Your wacky tube voice probably wouldnât fool your family or friends into thinking you were someone else. But it could trick a computer, new research finds.
In an experiment, people used specially designed tubes to make their voices sound like someone elseâs. Those fake voices could fool an artificial intelligence, or AI, model built to identify voices. Since voice-recognition AI is used to guard many bank accounts and smartphones, criminals could craft tubes to hack those accounts, says Kassem Fawaz. Heâs an engineer at the University of WisconsinâMadison.
He was part of a team that presented its findings in August at the USENIX Security Symposium in Anaheim, Calif.
âThis was really creative,â David Wagner says of the new study. He was not involved in this new research. But he is a security expert at the University of California, Berkeley. âThis is another step,â he says, âin the cat-and-mouse game of [cybersecurity] defenders trying to recognize people and attackers trying to fool the system.â
Mystique
Shimaa Ahmed wondered if a simple device could alter someoneâs voice. First, she tried putting her hands over her mouth. Then she grabbed a paper-towel tube. And surprise: âIt actually worked to fool the [artificial-intelligence] model.â Over the next 18 months, her team developed Mystique â a plastic version seen here.
Tricky tubes
Bad guys have already found ways to hack voice-ID systems. Typically they do this to steal from bank accounts. Most often, they use whatâs known as deepfake software. It uses AI to create new speech that mimics the owner of the targeted bank account.
In response, many voice-ID systems have added protections. They check whether thereâs any digital trickery behind a voice. But Fawazâs team realized that these systems werenât checking for non-digital tricks â such as tubes.
Tubes can alter someoneâs voice by tampering with sound waves. The sound of a voice contains waves of many different frequencies. Each frequency is a different pitch. As sound waves travel through a tube, the tube vibrates. The way it vibrates makes some pitches louder and others softer. And the way this alters pitches will depend on the length and width of the tube.
So Fawazâs team came up with a math equation. It told them what tube dimensions would alter one personâs voice to sound like anotherâs, at least according to an AI model. The two starting voices couldnât be too wildly different, though. For example, a person with a male-sounding voice usually couldnât use a tube to impersonate someone with a female-sounding voice, and vice versa.
The team didnât try to hack anyoneâs account. Instead, they tested out their tube trickery on AI models trained to recognize celebrity voices. Fourteen volunteers tried this out using a set of three 3D-printed tubes.
âEach participant who tried our system could impersonate some of the celebrities in the dataset,â says Shimaa Ahmed. Sheâs a PhD student at UWâMadison who works with Fawaz.
One personâs tube voice mimicked singer Katy Perry. Another volunteer got to impersonate Bollywood star Akshay Kumar. The tube voices fooled the AI models â60 percent of the time on average,â Ahmed says.
Listen: Same or different?
Which of these pairs of voices are the same speaker? Which pair is two different people? A tube changes someoneâs voice, but the changes donât easily trick people.
Pair 1
Pair 2
A different way of learning
People, however, werenât so easily duped.
The researchers presented a separate group of volunteers with tube-altered voices paired with the celebrity voices they were meant to mimic. The volunteers thought the two voices were the same person a mere 16 percent of the time.
The reason AI was easier to fool is that it doesnât learn to recognize voices the same way we do. To learn to recognize some voice, AI models must study training data â such as a set of celebrity voice recordings. And they can only learn whatâs in their training data. People donât often talk through tubes. So this feature is missing from training data.
Now that this clever tube hack has been revealed, other engineers can get started testing the hack on the voice-ID systems that many people use with their banks and personal devices. Their goal is to learn ways to prevent such tube attacks, explains Wagner.
Meanwhile, Fawaz and his team are designing more elaborate devices involving multiple tubes or twisted shapes. These could make it possible to transform voices in more extreme ways. Could it be possible for anyone to mimic anyone else using one of these? Stay tubed, er, tuned.
Educators and Parents, Sign Up for The Cheat Sheet
Weekly updates to help you use Science News Explores in the learning environment
Thank you for signing up!
There was a problem signing you up.