Over 4 Terabytes of Data Exposed, Including Social Media Profiles, Personal Information
Some 4 terabytes of data on over 1.2 billion individuals – including LinkedIn and Facebook profiles – was exposed to the internet on an unsecured Elasticsearch server, according to an analysis by a pair of independent researchers.
It’s not clear who owns the database or if any of the personally identifiable information it contained has been accessed by hackers or cybercriminals, according to analysis posted Friday by Bob Diachenko and Vinny Troia, who discovered the server in October.
The server stored 622 million email addresses, over almost 50 million phone numbers, plus names and profile information from LinkedIn and Facebook, the two researchers told Wired.
An examination of the exposed server found that the personal information came from two data enrichment companies, although both said they did not own the cloud-based server, according to the researchers’ report.
“We regularly look for open Elasticsearch databases and we were just scouring IP address and we found this one and right away we could see that it had 4 terabytes and that it was a pretty large database,” Troia told Information Security Media Group. “When we started to dig into it, we found a ton of user profile information and it was just a ‘holy wow’ moment … At a glance we saw almost 4 billion user records and once we went through it and did deduplication, we found 1.2 billion unique records and that’s pretty momentous.”
No password or authentication was needed to access the database, the researchers say. Troia told ISMG that he notified the FBI about the database, and then within a few hours, someone pulled the server and the exposed data offline. Troia added that when he examined the IP address further, it appears that the server itself dates from November 2018.
“Due to the sheer amount of personal information included, combined with the complexities identifying the data owner, this has the potential to raise questions on the effectiveness of our current privacy and breach notification laws,” Diachenko and Troia write in their report.
Tracking the Data
The two researchers found the exposed database on Oct. 16 as part of an ongoing research project using Shodan, an open source network discovery tool. And while the IP address for the Elasticsearch server was traced back to the Google Cloud Platform, it’s not clear who owns the database or who has responsibility for securing it, Troia, who runs the threat intelligence firm Data Viper, told Wired.
Screenshot of different indexes available on the exposed server (Source: Data Viper)
Most of the data was contained in four separate indexes on the exposed server, which were labeled either “PDL” or “OXY.” These appear to be references to two data enrichment companies, according to the researchers.
PDL appears to be a reference to a San Francisco company called People Data Labs, while OXY seems to point to a firm called OxyData, based in Cheyenne, Wyoming, the researchers report.
People Data Labs claims to have information on 1.5 billion people available for sale, the two researchers say.
While it appears that most of the data found on the Elasticsearch server came from these two companies, both firms denied that they owned the server, the researchers report.
Representatives of People Data Labs and OxyData told Wired that they have not experienced a recent data breach or loss of data they collect.
In their report, the researchers write: “The lion’s share of the data is marked as ‘PDL,’ indicating that it originated from People Data Labs. However, as far as we can tell, the server that leaked the data is not associated with PDL. “This raises a number of other questions. First, how did this mystery organization get the data? Are they a current or former customer? If so, the data discovered on the server indicates that this company is a customer of both People Data Labs and OxyData.”
Sean Thorne, co-founder of People Data Labs, told Wired: “The owner of this server likely used one of our enrichment products, along with a number of other data enrichment or licensing services. Once a customer receives data from us, or any other data providers, the data is on their servers and the security is their responsibility. We perform free security audits, consultations, and workshops with the majority of our customers.”
In a blog post published Friday, Troy Hunt the Australian security researcher who offers the free “Have I Been Pwned?” service that identifies individuals whose data has been exposed, notes that his data was in the database discovered by Diachenko and Troia.
As a result of the discovery of the database, Hunt added more than 622 million unique email addresses and other data to his repository and is notifying Have I Been Pwned? users, Wired reports.
“It’s entirely possible that this data came from a PDL subscriber and not PDL themselves. Someone left an Elasticsearch instance wide open, and by definition, that’s a breach on their behalf and not PDL’s,” Hunt writes. “Yet it doesn’t change the fact that PDL is indicated as the source in the data itself and it definitely doesn’t change the fact that my data (and probably your data too), is available freely to anyone who wishes to query their API.”
New data exposure: A customer of People Data Labs exposed with 1.2B data enrichment records. The data contained 622M unique email addresses as well as phone numbers, social media profiles and job histories. 84% were already in @haveibeenpwned. More: https://t.co/nfyAHqCQ5Y
— Have I Been Pwned (@haveibeenpwned) November 22, 2019
Troia told ISMG that he didn’t believe that People Data Labs was responsible and it appears that one of the company’s customer took the data -maybe more than it should have – added it to the server and then didn’t secure the database. Still, this incident raises privacy concerns. “It really boils down to a privacy issue and who regulates what is done with this data and who has access to it.”
The discovery of this database is the latest evidence of organizations that have deployed cloud-based technologies – including Elasticsearch servers, Mongo DB databases and Amazon Simple Storage Service buckets – but misconfigured them or neglected to enable strong passwords, leaving the data that they store exposed to the internet.
In a similar incident in 2017, an unsecured cloud database that market research firm Eactis was using for a data enrichment program left 340 million personal records exposed (see: Marketing Firm Exposes 340 Million Records on US Consumers).
Earlier this week, Noam Rotem and Ran Locar, self-described security researchers and hacktivists who have been conducted an internet mapping project, discovered a much smaller unsecured database that left payment card and other customer data exposed (see: PayMyTab Exposes Restaurant Customer Data: Report).
Rotem and Locar have also found larger online databases that left personal information exposed, including one that may affect every citizen of Ecuador (see: Investigation Launched After Ecuadorian Records Exposed).
In many database exposure incidents, companies that collect information fail to protect the data that they upload to the cloud and shared with others and neglect to implement the right access controls, says Chris Pierson, CEO of the cybersecurity company BlackCloak.
“All companies that allow other firms to access data that they host should have a robust security program in place with contractual guarantees in place to ensure that there are safeguards for that data,” Pierson tells Information Security Media Group. “In this instance, it seems that the companies failed to get those guarantees and audit against them.”