writing
When Deep Neural Nets Fail
Oct 2022
A Panda that is predicted as a Gibbon with 99.3% confidence
A Yellow/Black stripe is predicted as ‘school bus’ with ≈99.12% confidence 2
Adversarial Examples are imperceptible perturbations of natural inputs that induce erroneous predictions. Their corollary: Fooling Images 2 — unnatural inputs that trigger high-confidence predictions yet remain unrecognizable to humans.
These two related but systematically different phenomena provide unique insights into how Deep Neural Networks (DNNs) learn differently from Humans.
“Adversarial Examples” show how adding data that humans perceive as noise drastically alters a model’s predictions. This highlights three failures in simple DNNs: over-confidence, discriminatory narrow-sightedness, and lack of intentionality.
-
Over-confidence - Our intuitive understanding of confidence differs from a DNN’s class predictions. A human would give an out-of-sample input low confidence, aware that this input differs from training data. The model cannot detect that its frame of reference is incoherent, so it won’t flag the answer as uncertain.
- Notably contrastive loss approaches, and “robust” training are both techniques that combat adversarial prompts at this level. They do not attempt to combat the fundamental problem, but they can make the model less likely to be overconfident in “strange” samples.
-
Discriminatory Narrow-sightedness - Humans make high-level inferences, partly because we discard much of our visual input. Three times per second we go blind from Saccades and don’t notice — we fill in the blanks from memory.
- Deep Neural Nets use no such tricks. They dredge through data seeking any pattern with discriminatory power — even features utterly inhuman and unintended. These non-robust features predict well but prove brittle for broader object recognition (as opposed to dataset-localized tasks).
- Consider labeling all images with a red square for panda and a green square for gibbon. The model will learn to look for the square and will be highly confident in its prediction. However, this is not a feature representative of the task of recognizing a panda from a gibbon outside of our “augmented dataset”.
-
Intentionality Failure - These examples aren’t model failures. They’re challenges of intention. We trained the model to learn the tricks we use (“pandas have white and black fur”) without any prior to enforce that restriction. Enforcing such a restriction isn’t the answer — we’d get more human-interpretable models but lose valuable, perhaps inhuman, features.
Neural Networks pay far more attention to texture and color than humans do when classifying. For good reason: these features predict well, yet humans struggle to detect them.
Photo by Richard Brutyo, first result on Unsplash ‘dog’
Photo by Amber Kipp, first result on Unsplash ‘cat’
Another example: zoom lens artifacts. These have discriminatory power for dogs. We photograph dogs outside, at a distance; cats stay indoors (and wear bowties!). An overly restrictive model would miss these rich features.
The “Fooling Image” offers more human interpretability than raw noise. Black and yellow stripes predict school buses. But it reveals a more fundamental issue:
- A Lack of Sufficiency - DNNs assume all discriminatory features are sufficient. If the only images with black and yellow stripes are school buses, that becomes a sufficient feature. A human would treat it as a mere indicator, not a conclusion. The necessary conditions (a bus taking children to school, of any color) often aren’t the most discriminatory. Current DNNs don’t seek multiple conditions if one predicts well enough.
Footnotes
-
Adversarial examples are not bugs, they are features.
Ilyas, Andrew, Shibani Santurkar, Dimitris Tsipras, Logan Engstrom, Brandon Tran, and Aleksander Madry. Advances in neural information processing systems 32 (2019). ↩ -
Deep neural networks are easily fooled: High confidence predictions for unrecognizable images.
Nguyen, Anh, Jason Yosinski, and Jeff Clune. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015. ↩ ↩2