Researcher Finds Unsecure Elasticsearch Database Cluster Accessible via the Internet
A security researcher recently discovered an unsecure Elasticsearch database cluster exposed on the internet that contained transcripts of sensitive voicemail messages, including some for medical clinics and financial service companies.
Los Angeles-based Broadvoice, which offers voice over IP cloud-based telecommunications services to U.S. businesses, says its database has since been secured.
Security expert Bob Diachenko, working on behalf of Comparitech, a U.K.-based product testing website, uncovered the unsecure Broadvoice database on Oct. 1, Comparitech says in a blog post.
The cluster was accessible online with no password or other authentication required, the blog notes.
Diachenko “discovered the unprotected Elasticsearch cluster, which contained several data collections comprising a total of more than 350 million records, including caller names, phone numbers, and locations, among other data,” the blog notes.
The search engine Shodan.io date first indexed the database on Oct. 1, the blog says. “The same day, Diachenko sent a responsible disclosure to Broadvoice. He received an automated reply but no further correspondence,” according to the blog.
The database included transcriptions of hundreds of thousands of voicemails, many involving sensitive information, the blog says. Among the transcripts were patient messages for medical clinics, which included names of prescriptions or details about medical procedures. “In one transcript, the caller identified themselves by their full name and discussed a positive COVID-19 diagnosis,” the blog notes.
“Other voicemails left for financial service companies included details about mortgages and other loans, while there was at least one instance of an insurance policy number being disclosed.”
Diachenko contends the unsecured database cluster contains hundreds of millions of customer records. But a Broadvoice spokeswoman tells Information Security Media Group that the data “subset” the communications company believes was accessed by the researcher potentially included the records for fewer than 10,000 business clients.
Broadvoice says in an Oct. 15 statement that the security researcher on Oct. 1 was able to access “a subset of b-hive data … [that] had been stored in an inadvertently unsecured storage service September 28.” The company says it secured the data on Oct. 2. Comparitech blog notes the database appeared to have been secured on Oct. 4.
“Our investigation is ongoing, and we are not otherwise commenting,” the Broadvoice spokesperson says.
Broadvoice says it is taking steps to address the security situation, including:
- Ensuring the data has been secured;
- Launching an investigation;
- Alerting federal law enforcement;
- Working with the security researcher to confirm all the data accessed is destroyed.
“At this point, we have no reason to believe that there has been any misuse of the data,” Broadvoice says in its statement. “We are currently engaging a third-party forensics firm to analyze this data. We will provide more information and updates to our customers and partners upon completion of the investigation. We cannot speculate further about this issue at this time.”
Diachenko did not immediately respond to ISMG’s request for comment on the incident.
Diachenko has made several other discoveries of unsecured databases.
For example, on July 13, he discovered an unprotected database with information on 3.1 million patients that appeared to be owned by Adit, a Houston-based online medical appointment and patient management software company (see Unsecured Database Exposed on Web, Then Deleted).
But on July 22, the database apparently was deleted by a so-called “meow bot,” Diachenko wrote in a blog.
Last December, Diachenko also discovered that Microsoft accidentally exposed on the internet for three weeks 250 million customer support records stored in five misconfigured Elasticsearch databases. The company rapidly locked down the data after being alerted (see Microsoft Error Exposed 250 Million Elasticsearch Records).
So, why do database exposure incidents keep happening?
“It is often human error,” says regulatory attorney Marti Arvin of the privacy and security consultancy CynergisTek.
In the Broadvoice incident, not enough information is available to say whether the company had appropriate administrative, technical or physical controls in place, “but there is no control that will ever completely eliminate human error,” she notes.
Warren Poschman, senior solutions architect at German security vendor comforte AG, says several issues can come into play in these types of incidents.
“The root is often a lack of knowledge about the implications when the focus is on getting a project done quickly with minimal expense,” he says. “The focus is on sharing data and collaboration, and security and access controls seem to be at odds with that. There is also an element of ‘shadow IT’ at work here in an attempt to solve problems. “
The exposure of voicemail message transcripts and other personal data in the Broadvoice incident, “could be extremely risky”, Arvin says.
“Bad actors gaining access to such information could use it in multiple ways,” she says. “They might contact the individual and use the information that was exposed to get more information from the individual. Depending on the sensitivity of the information, it could lead to identity theft.”
A key factor in protecting data is knowing where the data it’s stored, Arvin says. “Creating a detailed data inventory along with a data classification system and being disciplined about updating the inventory can help avoid incidents like this,” she says.
To help prevent inadvertent database exposures, Poschman says organizations should avoid using complete data sets for analytics.
“It’s worth noting that access controls, database encryption, firewalls, data loss prevention and other similar traditional controls all fall down unfortunately in these cases – as evidenced by how data ends up getting into these exposed databases in the first place,” he says.
“Instead, they should ensure that the data is de-identified and anonymized by using technologies like tokenization, which substitutes secure tokens for data elements like names, addresses, account numbers, Social Security numbers, etc. while maintaining referential integrity,” he says.
“This allows data to remain secure regardless of where it ends up but also preserves the analytic value of the data without loss of fidelity.”