Optical illusions for computers
This segment originally aired in November 2017.
If you have OK vision, you can probably recognize a cat when it crosses the sidewalk in front of you. If it's dark or a little further away you might think it's a raccoon or maybe a small dog. Our brains do a pretty good job of figuring out what's going on around us. We might mix up a cat and raccoon now and then, but generally we're not too far off.
Now, some of the artificial intelligence programs we are creating, particularly the ones based on neural networks, do a pretty good job of learning in a similar way to us. But, in the end, they just think differently than we do.
Anish Athalye is a PhD student at MIT and a member of a student-run AI research group called Lab Six.
These neural networks have a certain vulnerability. "This AI technique that nowadays is being used for all sorts of stuff ... from image recognition to text processing to self-driving cars, it turns out that neural networks are susceptible to what are called 'adversarial examples,'" he said.
Adversarial examples or images are a kind of attack or hack of an AI system. It's a way of subtly changing an image so a person can't tell the difference, but the computer now thinks it's something else entirely.
"For example ... in the context of image classifiers — things that take in a single image and tell you, oh this is a cat or this is a dog, [or] this is a truck — [it] turns out you can take these images and, kind of, tweak them very slightly in a carefully controlled way such that the neural network is confused and predicts the wrong thing," said Athalye.
"Like, for example, you can take a picture of a cat, change a couple pixels very slightly and all of a sudden the neural network thinks, with 99.99 per cent probability, it's looking at guacamole."
Athalye's example isn't hypothetical. It's something his lab has actually done.
When we see a furry, four-legged object moving across the street in front of us, we might figure it's probably a cat, but there's a chance it's a dog, and a smaller chance it's something else. These AI systems are similar. But while we simply make an assessment and go with it, a neural network will actually decide the probability of each of the possibilities.
While there might be an 80 per cent chance it's a cat, there might also be a 0.1 per cent chance it's a bowl of guacamole. An adversarial attack will exploit that little bit of uncertainty.
Athalye and his team examine how the AI assesses an image, and then test whether they can "wiggle the pixels" in some small way to change the probabilities the AI uses.
"I take very small steps and kind of tweak the pixels, making each pixel lighter or darker to slowly move the classification probability toward the desired target," he explained.
The theory behind adversarial attacks began with researchers at Google in 2013. But the earlier examples were images that had to be uploaded directly into the computer, rather than images in the real world captured by a camera. Once the researchers turned the images on an angle or rotated them, the program stopped being confused and saw the original image.
"If you have self-driving cars that see objects," said Athalye, "well, an adversary can't control what the actual input is to the classifier, but they can control objects in the real world, perhaps. And if those objects are adversarial and robust enough that they survive whatever transformation happens when cars look at it from different angles or rotations or things like that, then that could be a real problem."
Athalye and his Lab Six colleagues are trying to find out whether they can use certain kinds of adversarial images — what they call "robust adversarial examples" — to trick an AI operating in the real world.
"So we want to see if it's possible to produce these things that look like one thing to a human, but look like something completely to a neural net. We wanted to see if we could make 3D objects that could consistently fool a classifier," he explained.
"One example we showed was that we could take a turtle. And when you print that out on a 3D colour printer, but now using our algorithm we can slightly alter the texture so that that's consistently classified as something else. And so we randomly chose to make it into a rifle."
If these attacks became common, they could have major effects on current and near-future technologies that employee neural networks to operate. If someone was able to alter a stop sign, for example, to appear to a self-driving car to be a speed limit sign, it could lead to major accidents.
Despite the risk, Athalye says that the attacks haven't reached that level yet. "There's no evidence yet that these systems have been exploited in the real world ... and there's certain challenges that must be overcome in order to make adversarial examples in the real world."
One of those limits is how the current adversarial examples have been created. "Our example shows how to make these in what is called the 'white box case,'" Athalye said. "That requires access to the source code or the details of the particular neural network you're attacking."
But that limit might soon be crossed. "One thing we're working on is actually attacking real world systems. So we're trying to figure out how to develop 'black box attacks' on these systems, to come up with robust adversarial examples."