Move Appears to Be Aimed at Big Data-Scraping Firms Such as Clearview AI
Radioactive data research (by row from top): Unaltered images; watermarked images; watermarks shown at 5x normal size; images shown with watermarks exaggerated (Source: Facebook)
Facebook scientists have proposed using watermarks to identify when online images get used to train neural networks.
The proposal appears to be aimed at least in part at the rise of big data startups such as Clearview AI that are scraping publicly available photographs from social networks and other sites, and using them for facial recognition purposes, prompting privacy concerns (see: Facial Recognition: Big Trouble With Big Data Biometrics).
“Absent a very strong federal privacy law, we’re all screwed.”
Neural networks are a type of machine learning that involves using a large set of training data to devise rules that can be used to identify future patterns (see: What’s Artificial Intelligence? Here’s a Solid Definition).
To detect if training sets have used Facebook images, a team of the company’s researchers has proposed building a system that can be used to find out. “We have developed a new technique to mark the images in a data set so that researchers can determine whether a particular machine learning model has been trained using those images,” say Facebook researchers Alexandre Sablayrolles, Matthijs Douze and Hervé Jégou in a blog post. “We introduce unique marks that are harmless and have no impact on the classification accuracy of models, but remain present through the learning process and are detectable with high confidence in a neural network.”
The researchers call their model “radioactive data,” since it’s analogous to using radioactive materials for medical purposes, such as swallowing barium to make certain areas of the body show up more clearly on an x-ray.
The model is focused on subtly altering pixels, so researchers can trace when specific images have been used to train a neural network, rather than attempting to interfere with such training processes. “Radioactive data differs from previous approaches that aim at ‘poisoning’ training sets in an imperceptible way such that trained models will generalize poorly,” they write.
Based on tests conducted with ImageNet – a large, visual database designed for use in visual object recognition software research – the researchers say that even when their radioactive data only comprised 1 percent of the data used to train a specific neural net, they could still verify that it had been used, thanks to the neural network itself devoting some of its capacity to keep track of their “radioactive tracers.”
“Although it is not the core topic of our paper, our method incidentally offers a way to watermark images in the classical sense,” the researchers add in a fuller paper – “Radioactive data: tracing through training” – on the topic, published Monday.
But they caution that their model could face unknown “adversarial scenarios” that might seek to identify and suppress such watermarks, to defeat this type of system.
Likely Target: ‘Big Data’ Facial Recognition Tools
Facebook’s proposed move appears to be squarely aimed at the likes of Clearview AI, a startup that was little known outside of law enforcement circles until Jan. 18. That’s when The New York Times privacy journalist Kashmir Hill detailed how the small company, launched by Australian Hoan Ton-That, developed a facial recognition app based on a catalog of 3 billion images the company scraped from Facebook, Twitter, YouTube, Venmo and numerous other sites.
A chart from marketing materials that Clearview provided to law enforcement. (Source: The New York Times)
“Federal and state law enforcement officers said that while they had only limited knowledge of how Clearview works and who is behind it, they had used its app to help solve shoplifting, identity theft, credit card fraud, murder and child sexual exploitation cases,” Hill reported.
Silicon Valley and privacy advocates quickly struck back, with Twitter telling Clearview that it was violating its terms of service. Meanwhile, New Jersey officials banned police in the state from using Clearview.
Clearview was also quickly hit by a class-action lawsuit filed in Illinois, alleging that the company broke the state’s privacy laws. That arrived on the heels of a $550 million settlement with Facebook, over a class-action lawsuit alleging that the company violated Illinois law in collecting data for a facial recognition tool without users’ consent.
‘No Monopoly on Math’
Sen. Edward Markey, D-Mass., has written to the company, posing 12 questions he wants to see answered, including a list of all law enforcement agencies that are currently using the technology. “Clearview’ s product appears to pose particularly chilling privacy risks, and I am deeply concerned that it is capable of fundamentally dismantling Americans’ expectation that they can move, assemble, or simply appear in public without being identified,” Markey writes.
Legally speaking, however, experts say the company is likely in the clear, not least because it appears to only have been using publicly available images. “It’s creepy what they’re doing, but there will be many more of these companies. There is no monopoly on math,” Al Gidari, a privacy professor at Stanford Law School, told The New York Times. “Absent a very strong federal privacy law, we’re all screwed.”