A new shield could guard AI agents against cyberattacks

A teen programmer built the system to help detect and ward off ‘prompt injection attacks’

Prompt injection attacks disguise harmful instructions inside seemingly innocent inputs, such as bits of text. AI agents exposed to these can be manipulated into sharing sensitive data, spreading misinformation and more. A teen’s new software would shield AI agents from such cyberattacks.

sorbetto/Getty Images

By Maria Temming

1 hour ago

Kevin Lu, 17, is working on ways to protect AI from sneak attacks meant to steal sensitive data or do other harm.

Today, people are using AI agents to perform a growing mix of tasks — from drafting emails to handling files or searching the web. But these agents can be vulnerable to something known as prompt injection attacks. That’s when a hacker hides instructions inside a seemingly innocent input, such as a piece of text. When an AI model encounters that input, it can be coaxed to spill private data, spread fake news and more.

Kevin Lu, 17, devised a way to help protect AI assistants from stealthy “prompt injection attacks.” Society for Science

There’s no foolproof way to ward off prompt injection attacks. But Kevin forged a new shield. His software can help guard AI agents against these types of hacks.

His program traps suspicious prompts before they can reach an AI model. And it monitors the AI for evidence that it is being manipulated by a prompt injection attack.

In tests, no simulated cyberattacks got through Kevin’s shield. He hopes this system could help make AI agents more secure. He’s especially concerned about those that people entrust with online bank accounts and other private data.

Kevin is currently a senior at Bellarmine College Preparatory School in San Jose, Calif. His research earned him a finalist spot at the 2026 Regeneron Science Talent Search. (That competition is run by Society for Science, which also publishes Science News Explores.) In this interview, Kevin shares his research experiences.

What was your reaction to seeing how well your system performed?

“I worked on this for over a year,” Kevin says. “I began with a completely different solution.” Gradually, he revised and expanded his AI protection program. “I wouldn’t say I had a really big ‘aha’ moment” in seeing how well the system performed, Kevin says. “But it was just really rewarding to work on it continually.”

What was the biggest challenge?

“Since I worked on it by myself, it was kind of hard to know if I was going in the right direction,” Kevin says. “I had a lot of inspiration from this one weblog.” The blogger, Simon Willison, had written about how prompt injection attacks work and how they might be stopped. Google DeepMind researcher Neel Nanda was another big inspiration, Kevin says. Watching Nanda’s livestreams helped Kevin learn how to code some parts of his project.

What was your favorite part?

“I had a lot of fun coding the project,” Kevin says. “I also really liked making the poster, because I was able to draw a bunch of these flow charts … that I can point to and showcase [the work] in a less technical way.” That made it easier to talk about his research with family and friends. “I really felt like that elevated my ability to communicate my work.”