In March 2020, two months later The New York Times exposed that Clearview AI had scraped billions of images from the internet to create a facial recognition database, Thomas Smith receive a file most of his digital life.
Using the recently passed California Consumer Privacy Act, Smith asked Clearview what they had about him. The company sent him photos that span moments in his adult life: a photo from when he got married and started a blog with his wife, a photo when he was profiled by his university’s alumni magazine, even a profile photo from a Python coding meeting he attended. had attended several years ago.
“That’s what really hit me: all the stuff I posted on Facebook and thought, ‘Nobody will ever look for that,’ and here it’s all logged into a database,” Smith told me. The edge.
Clearview’s massive surveillance device claims to hold 3 billion photos, accessible to any law enforcement agency with a subscription, and it’s likely that you or people you know have been picked up in the company’s dragnet. It has been known to scrape sites like Facebook, LinkedIn, YouTube and Instagram, and can use profile names and associated images to build a wealth of identified and scannable facial images.
Little is known about the accuracy of Clearview’s software, but it appears to be powered by a vast trove of scraped and identified images, taken from social media profiles and other personal photos on the public internet. That scraping is only possible because social media platforms like Facebook consolidated massive amounts of personal data on their platforms and then largely ignored the risks of large-scale data analytics projects like Clearview. It took Facebook until 2018 and the Cambridge Analytica scandal to lock developer tools that can be used to exploit its users’ data. Even after the extent of Clearview’s scraping came to light, the response from Facebook and other tech platforms came largely in the form of strongly worded letters asking Clearview to stop scraping their sites.
But with major platforms unable or unwilling to go any further, the average person on the web faces a tough choice. Any new photos featuring you, whether it’s a simple Instagram shot or a photo tagged on a friend’s Facebook page, are potentially grist to the mill of a global facial recognition system. But for many people, hiding our faces from the internet doesn’t feel like an option. These platforms are too deeply entrenched in public life and our faces are too central to who we are. The challenge is to find a way to share photos without subjecting them to the wider scanning systems – and it’s a challenge without clear answers.
In some ways, this issue is much older than Clearview AI. The internet was built to facilitate the posting of public information, and social media platforms have entrenched this idea; Facebook recruited a billion users between 2009 and 2014, when it was the default setting to post publicly on the web. Others, such as YouTube, Twitter, and LinkedIn, encourage public posting as a way for users to gain influence, contribute to global conversations, and find work.
Historically, one person’s contribution to this unfathomable amount of graduation photos, vacation group photos, and selfies would have ensured safety in numbers. You may see a security camera in a supermarket, but it is unlikely that anyone will actually view the footage. But this kind of thinking is what Clearview thrives on, as automated facial recognition can now cut through this digital glut at the scale of the entire public internet.
“Even when there were a lot of surveillance cameras in the world, there wasn’t a good way to analyze the data,” said Catherine Crump, a professor at UC Berkeley’s School of Law. “Facial recognition technology and analytics in general have been so revolutionary because they’ve put an end to privacy through ambiguity, or it looks like they will soon.”
This means you can’t rely on blending in with the crowd. The only way to prevent Clearview from collecting your data is to not allow it on the public internet in the first place. Facebook makes certain information public, without the option to make it private, like your profile picture and cover photo. Private accounts on Instagram can’t hide profile pictures either. If you’re concerned about information being scraped from your Facebook or Instagram account, these are the first images to change. LinkedIn on the other hand let you to limit the visibility of your profile picture to only people you’ve interacted with.
Outside of Clearview, facial recognition search engines like PimEyes have become popular tools accessible to everyone on the web, and other business facial recognition apps such as FindFace working with oppressive governments around the world.
Another important part of protecting the privacy of those around you is making sure you don’t post photos of others without permission. Smith, who requested his data from Clearview, was amazed at how many others in the database had been picked up just by being in the picture with him, such as his friends and his academic advisor.
But since some images on the web, such as those on Facebook and Instagram, simply cannot be hidden, some AI researchers are exploring ways to “disguise” images to evade Clearview’s technology, as well as other facial recognition technology scouring the open web.
In August 2020, a project called Fawkes, released by the University of Chicago’s SAND Lab, presented itself as a potential antidote to Clearview’s ubiquitous scraping. The software works by subtly changing the parts of an image that facial recognition uses to distinguish one person from another, while trying to preserve how the image looks to people. This exploit on an AI system is called a ‘counter-attack’.
Fawkes emphasizes the difficulty of designing technology that attempts to hide images or limit the accuracy of facial recognition. Clearview uses hundreds of millions of identities, so while individual users may benefit from using the Windows and Mac app developed by the Fawkes team, the database won’t really suffer from a few hundred thousand fewer profiles.
Ben Zhao, a University of Chicago professor who oversees the Fawkes project, says Fawkes only works if people are diligent about disguising all their images. It’s a big question as users have to juggle multiple versions of every photo they share.
On the other hand, a social media platform like Facebook could address the scale of Clearview by integrating a feature like Fawkes into the photo upload process, but that would simply change which company has access to your unadulterated images. Users would then have to trust that Facebook isn’t using that access to now proprietary data for their own ad targeting or other tracking.
Zhao and other privacy experts agree that hostile tricks like Fawkes are not a panacea that will be used to defeat coordinated scraping campaigns, not even those for facial recognition databases. Avoiding Clearview takes more than just a technical fix or privacy check on Facebook. Instead, platforms will have to rethink how data is uploaded and maintained online, and what data is publicly accessible at all. This means fewer public photos and fewer opportunities for Clearview to add new identities to its database.
Jennifer King, a privacy and data policy officer at Stanford’s Institute for Human-Centered Artificial Intelligence, says one of the ways is to automatically delete data after a certain amount of time. Part of what makes services like Snapchat more private (when set up properly) than Facebook or Instagram is its commitment to ephemeral media posted primarily to small, trusted groups of people.
Laws in some states and countries are also starting to catch up with online privacy threats. These laws bypass platforms like Facebook and instead demand accountability from the companies that actually scrape the data. The California Consumer Privacy Act allows residents to request a copy of the data that companies like Clearview have about them, and similar provisions exist in the European Union. Some laws require the data to be deleted at the user’s request.
But King notes that once the data is deleted, it doesn’t mean the company can’t just grab it again.
“It’s not a permanent opt-out,” she said. “I’m concerned that you’ll make that ‘delete my data’ request on May 31, and they may go back to collecting your data on June 1.
So if you want to block your online presence, make sure you change your privacy settings and delete as many images as possible before asking companies to delete your data.
But ultimately, to prevent malicious parties like Clearview from getting any data at all, users are at the mercy of social media platform policies. After all, it’s the current state of privacy settings that allows a company like Clearview to exist at all.
“There’s a lot you can do to protect or reclaim your data, but to change this, it ultimately has to be done collectively, through legislation, through lawsuits, and through people coming together and deciding what privacy should look like. ‘ said Smith. “Even people who get together and say to Facebook, ‘I need you to better protect my data.'”