Spark

AI can be easily fooled. This could have serious implications

New research demonstrates that AI systems can be fed malicious images or other inputs by attackers, with very scary results.

Bad actors can attack automated systems by 'tricking' AI

Even slightly defaced stop signs might confuse the AI controlling a self-driving car. (Adam Killick/CBC)

Imagine you're driving up to an intersection, and the stop sign has some stickers on it. Maybe some graffiti, too.

You'd still recognize it as a stop sign, right? And more important, you'd stop?

A driverless car, however, might interpret the defaced sign as something else. Like, say, a speed limit sign.

That, as you can imagine, would not be good.

It's a pitfall in machine learning that researchers like Dawn Song are trying to understand—and hopefully prevent.
Dawn Song (Berkeley.edu)

Song spoke to Spark host Nora Young about adversarial learning, where an AI is 'tricked' into seeing something that isn't really there. These so-called "perturbations" could have a dramatic impact on the reliability of AIs.

So is the risk that an AI could be trained on malicious data, or that after it's trained and it's being used in the real world that it could be fed examples that would lead it to make the wrong decision?

These types of attacks on the integrity of the machine learning system actually can happen at different stages. So the example that I gave had happened at the "inference" stage, when the machine-learning system needed to make a prediction. The attacker can also launch the attack at the training stage, by supplying what we call 'poisoned' training data. So the machine-learning system is trained, essentially, with the wrong data. Then the attacker can fool the machine-learning system into learning the wrong model.

So are these hypothetical risks? Or are these things that you've actually shown to be things that can happen?

These things really can happen, and also can happen in the real world. As our recent work shows in the traffic sign example, we've shown that these adversarial examples are not only effective as perturbations to the digital image, but when you have perturbations to the actual, physical, real-world object [such as a sticker on a stop sign].

This interview has been edited for length and clarity. Click the listen button above to hear the full conversation.