August 14th, 2024

Inside the "3 billion people" national public data breach

National Public Data experienced a major data breach involving around 2.9 billion records, prompting a class action lawsuit due to exposed sensitive information, with many records being duplicates.

Read original articleLink Icon
ConcernFrustrationSkepticism
Inside the "3 billion people" national public data breach

The recent data breach involving National Public Data (NPD) has raised significant concerns due to the scale and nature of the exposed information. Allegedly, a threat actor known as "USDoD" has put up for sale approximately 2.9 billion records, which include sensitive personal details such as full names, addresses, and Social Security Numbers (SSNs). However, the actual number of unique individuals affected may be much lower, as many records appear to be duplicates. The breach has been linked to a class action lawsuit initiated by a California resident who was notified of the exposure by an identity-theft protection service. The data, which has been circulating since early June, was publicly leaked in August, prompting further scrutiny. While the breach is significant, the exact amount of data remains unclear, with discrepancies in reported sizes and contents. Some datasets contain email addresses, while others do not, complicating the verification process. The incident highlights the challenges in attributing data to specific sources, especially when dealing with data aggregators like NPD, which compile information from various public records. As investigations continue, the implications for data privacy and security are profound, emphasizing the need for robust protections against such breaches.

- National Public Data suffered a massive data breach involving approximately 2.9 billion records.

- The breach has led to a class action lawsuit due to the exposure of sensitive personal information.

- Many records in the leaked data are duplicates, suggesting fewer unique individuals are affected.

- The breach raises concerns about data privacy and the challenges of verifying data from aggregators.

- The incident underscores the importance of identity protection services in alerting individuals to data exposure.

AI: What people are saying
The comments on the data breach highlight several key concerns and themes regarding data privacy and security.
  • Many users express skepticism about the effectiveness of current data opt-out services and the legality of data aggregators.
  • There is a strong call for regulatory changes to enhance consumer privacy rights and impose significant penalties on data brokers for breaches.
  • Several commenters suggest that the public's perception of Social Security Numbers (SSNs) needs to shift, advocating for new identification methods.
  • Concerns are raised about the potential for widespread identity theft and the inadequacy of existing protections against data breaches.
  • Some users propose radical solutions, such as making all personal data public or issuing new SSNs to mitigate the impact of breaches.
Link Icon 40 comments
By @throwup238 - 2 months
> While the specifics of the data breach remain unclear, the trove of data was put up for sale on the dark web for $3.5 million in April, the complaint reads.

I guess they failed to sell it because links to the leaked data on usdod.io have been available on Breachforum/Leakbase for over a week now. Someone created a magnet link yesterday and it's fully seeded so speeds are fast.

The data in the breach is irreversibly public now.

By @d_burfoot - 2 months
It's worth remembering that the main reason this kind of data breach is a real problem is mostly due to the incompetence of the IRS. For any serious financial organization, knowing a person's SSN, name, address, etc doesn't allow you to access or withdraw that person's finances.

But the stupidity of the IRS means that people are easily targeted by false tax return attacks. File a fake tax return for someone, using their SSN/name/address, but tell the IRS you changed address. Then the IRS sends your tax refund to the new address, and boom, you just collected some poor sod's refund. To add insult to injury, the IRS is probably going to audit the person whose refund you stole.

By @CrispyKerosene - 2 months
Troy mentions "data opt-out services. Every person who used some sort of data opt-out service was not present."

Anyone have experience with these sort of services? A search brings up a lot of scammy looking results. But if services exist to reduce my profile id be interested.

By @johnnyballgame - 2 months
Extreme Privacy by Michael Bazzell is a great resource to learn how to limit exposure to these aggregator services.

https://inteltechniques.com/book7.html

By @blackeyeblitzar - 2 months
It is crazy to me that data brokers are even a legal form of business. All of these services should be opt in at minimum. If they are obtaining publicly available information and making it easier to access, they should have to maintain insurance or a deposit with the government to compensate victims of cybersecurity incidents. Telling people to get credit monitoring is in NO WAY an acceptable way to make us whole. They need to pay for a lifetime of monitoring and INSURANCE up to the net worth of affected individuals. This needs to become law ASAP.
By @datadrivenangel - 2 months
"there were no email addresses in the social security number files. If you find yourself in this data breach via HIBP, there's no evidence your SSN was leaked, and if you're in the same boat as me, the data next to your record may not even be correct. "

Seems like Troy is skeptical about this being a real full breach?

By @EvanAnderson - 2 months
For years I've said the entire SSN database just needs to be published alongside legislation strictly assigning liability to any company who defrauded as a result of using the SSN as a "secret". That would fix the problem with SSN's and "identity theft" quickly.

Part 1 has been accomplished. Let's get part 2 going!

Aside: It amazes me how the American public has allowed defrauded companies to assign the company's loss as a liability to innocent individuals (in the form of "identity theft"). It would be great if we could get that changed in the minds of the public. A well-informed public could collectively turn "identity theft" into the "bank's problem" (from the old adage "If you owe the bank a billion dollars they have a problem..."). The insurance industry would swoop in as the defrauded parties start making claims and shoddy security practices would get tightened-up.

(Edit: I fear insurance companies coming in to "fix this" to some extent-- citing my experiences with PCI DSS compliance auditing and Customers who have had 'cyber insurance' policies coming with ridiculous security theatre requirements. Maybe we can end up with something like a 'cyber' Underwriters Labs in the end.)

(Also: Yikes! I hate that I just typed 'cyber' un-ironically.)

By @quantumfissure - 2 months
For non-Americans (and Americans) that don't quite understand what SSN is and why it's a problem, CGP Grey [1] has a great (and short) video about the history and why it's not technically an identifier, but has become one.

[1] https://www.youtube.com/watch?v=Erp8IAUouus

By @left-struck - 2 months
> The problem with verifying breaches sourced from data aggregators is that nobody willingly - knowingly - provides their data to them

This is a bit of a tangent but I feel like if we can prove this statement then these data aggregators should be made illegal. How can you consent to something that you don’t know you’re consenting to? Likewise why do these entities have the right to collect detailed personal information like SSN without your explicit, beyond reasonable doubt, consent? To me this is the most obvious failure of the legal system, it clearly goes against well established legal principles that a basic requirement of an agreement is that all parties know what they are agreeing to.

Obviously there is some leeway with agreements where it’s not possible to clarify every eventuality but lets say if you’re applying to rent a place through an online form and that form shares your SSN to a data aggregator, it should be extremely clear about that, and possible to out out while still allowing you to complete the rental application without discrimination.

It’s like, it should be possible to show that no one, with in reason, consented to sharing their data with this aggregator because no one is able to confirm that they did. Sure one person could forget, or lie, but 100s of millions of people? No. Clearly almost zero people knowingly consents.

By @araes - 2 months
I was wondering why Google suddenly turned on "prompt authentication" on zero-security feature accounts yesterday. Now I "must" have a phone nearby to use Gmail... Tap to authenticate every time you want to look at ... ad spam.

With this, Ticketmaster, and the CDK Global car theft, is there anybody on Earth who doesn't need data protection? Poor people in Somalia need data breach notices. People who are not even on the WWW need data breach notices...

By @esmeraldametteo - about 2 months
I recently hired the experts of {hacker11tech (@) gmail com} to help me track my spouse's GPS location, as I suspected infidelity. They provided me with accurate and timely information, revealing that my spouse was frequently visiting another person's location instead of going to work as claimed. Their expertise and professionalism were very impressive, and their ethical approach ensured a discreet and confidential process. The evidence gathered was comprehensible and reliable, giving me clarity that I needed to address the situation. I highly appreciate the {hacker11tech (@) gmail com} dedication helping to uncover the truth while maintaining ethical standards, their services was valuable in helping me make decisions about my relationship. I highly recommend this team {hacker11tech (@) gmail com} for anyone seeking reliable ethical practices and their commitment is reassuring.
By @hn72774 - 2 months
Anything the average SSN holder should be doing proactively?
By @blindriver - 2 months
Why are data aggregators legal? In California can we create a proposition to shut them down in the state?
By @idontknowtech - 2 months
This sort of stuff will continue happening until the regulatory framework acknowledges a fundamental consumer right to privacy.

If a data broker collects data without the consent of the consumer, then their only real risk is a class action lawsuit which drags on for six years, gets settled for a few days profit, and the consumer gets $13.50 after the legal fees. This massive skew in the risk reward calculus of data brokers is why we have the problem. Because there's little to no real downside, the trend is automatically collect as much data on as many people as possible.

Fixing this means big, mandatory, cash penalties in the law code - say $5k per consumer data leak, directly to the affected consumer, with added penalties if the company lies about the leak or delays payment. The fine must be big, mandatory, and paid directly to the consumer. Only that changes the risk reward ratio.

In that new world, companies would have to re assess their risks. They'd either build invulnerable systems and hire a lot more people reading HN to protect their golden goose, or better still they'd decide to exit the business entirely. That sounds bad, but the only reason the industry exists is because regulators failed to foresee massive leaks like this happening every three months.

We need a consumer data privacy law, with massive fines, to force companies to change their behavior. What we're doing now clearly does not work.

By @dimgl - about 2 months
I used Robokiller to remove myself from data broker lists. I'm extremely impressed with it. I pay yearly. My only annoyance with Robokiller is that

A) It's necessary. When is the government going to start creating laws to help us and prosecute this?

B) It's expensive. Most people cannot afford this. I can barely afford it but my information has been leaked online.

C) It's inconvenient. A majority of calls are spam, but I'll often miss important calls from unknown numbers because Robokiller acts as a proxy and for some reason the call is routed through the Internet.

Anyhow, my wife and I are not on this list. I'm wondering if using Robokiller saved us from a lot of pain here.

By @velcrovan - 2 months
Even before this, anyone operating a service who isn't treating SSNs as public knowledge in 2024 needs to be, well, shamed or penalized or something.
By @uticus - 2 months
I’ve finally figured out the play: war of attrition.

Eventually enough data will be leaked to make moot the benefits of securing any personal data. At that point everyone stops trying and moves on to more financially rewarding activities.

I mean even if I’m an elephant, and data breaches are blind men, eventually enough blind men will draw a true comprehensive picture.

By @puzzledobserver - 2 months
Several other commenters have brought about the sneaky wordplay involved in saying "identity theft" instead of simply calling it "fraud on the bank", and somehow turning the person into the victim rather than the bank that has been defrauded.

Has anyone tried to argue this point in court? Has this survived / how did this terminology shift survive judicial scrutiny?

By @fnord77 - 2 months
From the NPD website:

> Please be advised that we will not collect, use, disclose, sell, or share the sensitive personal information or sensitive data of California, Virginia, Colorado, or Connecticut residents as those terms are defined by the CCPA/CPRA, VCDPA, CPA, or CTDPA, respectively.

By @hypeatei - 2 months
Does anyone else just not give a fuck at this point about their SSN? I feel like maybe early 00s this would be scary but it's clear that everyone's SSN is out there already or waiting to get breached from a shady private data broker.

The problem lies in how institutions treat the SSN, not the number itself.

By @janalsncm - 2 months
Are there any ways to check the breach to see if my information is there, other than downloading it myself? I’m not sure of the legality of doing so.
By @29athrowaway - 2 months
Time for services everywhere to stop using SSNs for identification and for the US to move on to a more advanced form of identification.

And lock your credit.

By @itamblyn - about 2 months
Is there a straightforward way to download this file for research purposes?
By @sergiotapia - 2 months
Downloaded the torrent, and it's a 164GB text file.

What's a quick way to search if my SSN is in the file? I ask before diving in, it's currently extracting and ETA is 40 minutes.

By @NoMoreNicksLeft - 2 months
Can't the SSA just issue 330 million new social security numbers, and tell people to be more careful with them from this point forward?
By @smcin - 2 months
By @seydor - 2 months
What if we just made all this data free , some AI is going to compile them anyway (and probably already has). Deterrence is the best defense, right ?
By @JumpCrisscross - 2 months
“The database DOES NOT contain information from individuals who use data opt-out services. Every person who used some sort of data opt-out service was not present.”

Like what?

By @peterbecich - 2 months
i.m.o. "National Public Data" in title should be capitalized; it is a proper noun https://en.wikipedia.org/wiki/National_Public_Data
By @jpcookie - 2 months
And where is this information that this random group supposedly has? I have yet to see proof of that being real
By @luxuryballs - 2 months
the government should have put out honey pots or something, or maybe it’s time to get new numbers and just invalidate all the stolen data, there is clearly money for fixing this kind of thing but they’re using it to spy on us and do who knows what else instead
By @USDoD - about 2 months
Does anyone know the correct password?
By @farceSpherule - 2 months
I worked incident response for years, logging thousands of hours of actual on site work with impacted clients.

No on cares.

Clients see this as the cost of doing business and have no incentive to do better. Even after Equifax and OPM.

Until we have a GDPR style law in the U.S. it will continue to be status quo.

By @tmaly - 2 months
I sure wish the US had a version of GDPR.

I get a data breach notice at least a few times a year. I got one for my kids two months ago for their medical data. I thought HIPPA had huge penalties but I guess not.

By @robustcollector - 2 months
Perhaps HN readers would appreciate a detailed account of what the NPD torrents contain.

The torrent deliver two files like so:

  NPD202401.7z  33,456,912,010 bytes (32GB)
  NPD202402.7z  20,548,499,322 bytes (20GB)
Uncompressing NPD202401.7z results in:

  ssn.txt 176,806,109,779 bytes (165GB)
  wc -l ssn.txt ==>> 1,698,302,005 lines
Uncompressing NPD202402.7z results in:

  ssn2.txt 120,722,361,611 bytes (113GB)
  wc -l ssn2.txt ==>> 997,379,508 lines
This is a total of 1698302005+997379508 = 2,695,681,513 lines.

Each line is a comma separated record with these fields:

ID,firstname,lastname,middlename,name_suff,dob,address,city,county_name,st,zip,phone1,aka1fullname,aka2fullname,aka3fullname,StartDat,alt1DOB,alt2DOB,alt3DOB,ssn

Generally records have ID, firstname, lastname, middlename, address, city, county_name, st, zip, and ssn. Most records do not have the fields for name_suff (name suffix), phone1, aka1fullname, aka2fullname, aka3fullname, StartDat, alt1DOB, alt2DOB, and alt3DOB.

There are no emails at all. There is no "@" in the files anywhere. Phone numbers are very rare.

I don't know what the ID number at the head of each line represents. I presume it is an internal index used by the organization that compiled the data. The SSN is at the end of each line.

The files have U.S. addresses only as far as I can tell. Nothing from Mexico, Canada, or other foreign countries.

Many of the lines (records) concern the same person at various addresses. Of 7 random people who I personally know that I checked on, all had entries. There were between 3 and 20 lines (records) for these 7 persons, averaging about 10. They usually differed only in the address field. Going by an estimate of 10 records per person, the 2.6 billion lines represents about 2695681513/10 = 269,568,151 distinct persons in the U.S.

The U.S. population is about 337M where 78% is over 18 years of age. In other words, 337000000*0.78 = 262,860,000 Americans are adults. This is pretty close to my estimate of 269,568,151 distinct individuals in the NPD data files.

Of the 7 persons I checked on, the names were spelled correctly, although the middle name was sometimes just an initial. I searched each person by multiple methods (address, last name, birth date) so I believe I would have detected names that were spelled slightly wrong.

The addresses appeared correct but there was no way to tell which was the current address and the order in which they lived at each address. There is a StartDat field but it was almost never filled in. The latest entry was not always the most current address. In a couple cases, the current address, where the person has been living for several years, was absent.

The birth dates were correct in a couple cases, were abbreviated in three cases (that is, instead of showing 19800704, meaning July 4 1980, it showed 19800700, meaning July 1980 without an exact day), and was wrong for one person by a wide margin.

All 7 persons I checked had SSN numbers. It was correct for 1 person but I don't know for the other 6. The SSN numbers were consistent for each of the 7 persons I checked on. By this I mean that a person did not have more than 1 SSN number, at least among the 7 persons I checked on.

By @albert_e - 2 months
off topic

does HIBP automatically cover plus addressing variants of an email

example I submit johndoe@example.com

but a breach had johndoe+verizon@example.com

will it match

By @toomuchtodo - 2 months
Ahh, cool, pour the corpus through GPTs and start tweeting Congressional rep personal info at them until they pass a law to outlaw data brokers (in keeping with historical precedent [1] [2]).

[1] https://en.wikipedia.org/wiki/Video_Privacy_Protection_Act

[2] https://jolt.law.harvard.edu/digest/dodging-the-thought-poli...

By @ghm2180 - 2 months
I am just dreading the day when a near simultaneous cyberattack on a high number of(more vulnerable like middle-lower income individuals) start in a DDoS fashion:

1. Credit histories will be(unlocked) used to file multiple credit applications and tax credits will be applied for.

2. Multiple Cell phones will be hijacked through Sim Hijacking or other zeroday attacks to make it very difficult to get back in.

3. A person's profile will be used to attack the most vulnerable things: - Their families will get fake calls to create confusion. - Their financial services will be frozen or worst weak 2fac auth ones will be compromised.

4. Deep fake image and videos will be created from compromised accounts to sow further mayhem.

This already happens in targeted and one startegy of teh other fashion. Imagine what one could do with a bit more compute and completed profiles and orchestrate this kind of terrible vengeance.

By @layer8 - 2 months
TL;DR:

> an intriguing story that doesn't require any further action.