April 2nd, 2025

Digital Archivists: Protecting Public Data from Erasure

Digital archivists preserve essential public datasets for research, with Harvard's Library Innovation Lab archiving Data.gov. The Internet Archive aids access, countering risks from historical data removals during political transitions.

Read original articleLink Icon
Digital Archivists: Protecting Public Data from Erasure

Digital archivists play a crucial role in preserving public data, which is essential for scientific research and the integrity of the scholarly record. The Library Innovation Lab at Harvard Law School has developed an extensive archive of Data.gov, containing over 311,000 public datasets, to safeguard against the potential erasure of vital information. The Internet Archive's Wayback Machine has also been instrumental in maintaining access to government websites and datasets, which can be affected by political transitions. Historical instances of data removal, such as during the George W. Bush administration and more recently under the Trump administration, highlight the risks associated with the loss of public data. These deletions can undermine years of research and invalidate scientific models. The efforts of digital archivists ensure that knowledge remains accessible, allowing future innovations to build on past discoveries. Their work is vital in maintaining a continuous record of human knowledge, particularly in fields like science, engineering, and medicine, where data integrity is paramount.

- Digital archivists are essential for preserving public datasets critical for research.

- The Library Innovation Lab at Harvard Law School has created a significant archive of Data.gov.

- Historical data removals during political transitions pose risks to scientific integrity.

- The Internet Archive's Wayback Machine helps maintain access to important government data.

- Continuous archiving efforts ensure that knowledge remains accessible for future innovations.

Link Icon 7 comments
By @dmillar - 2 days
Many criminal records, petty or otherwise, are public record. When archived, expunged or dismissed infractions never truly become that. A traffic violation or other petty misdemeanor from 20 years ago, that has been expunged from official record, can show up on a background check because companies archive public data. So, there is a flip side to this.
By @badlibrarian - 2 days
There's a lot of panic and overlap in the space; a way to coordinate these efforts would be helpful.

Internet Archive et al. made noise and promises but told volunteers to stop because they couldn't actually handle the ingest.

https://www.reddit.com/r/Archiveteam/comments/1jbgycm/us_gov...

These folks made a notable effort.

https://webrecorder.net/blog/2025-03-25-govarchive-us-and-mi...

By @Damogran6 - 2 days
Hypothetically: -Government leader says they're nuking data -Mad rush to back up data through other means -Government leader declares they've 'transferred the cost of maintaining data out of government, thus making for a smaller, more efficient, government'

I hate everything about this.

By @Teever - 2 days
I made this related submission[0] recently but it was flagged.

This stuff is very important to talk about so I hope that this submission by rbanffy isn't also flagged.

[0] https://news.ycombinator.com/item?id=43543075

By @nla - 2 days
Best thing I ever heard from the head of archives at the BBC:

Once you format shift, you will always be format shifting.

Keep your originals whenever you can.

By @mikrl - 2 days
How does this relate to dox?

Let’s say an individual posted identifying or incriminating information online, inadvertently or intentionally, in a public place.

Then a third party decides to store it, and possibly make it accessible to others.

If the original self doxxing user then pulled the original dox, but was unable to scrub the rest, would that information still be considered public, or would it be private? Was it ever truly public? Or private for that matter?

By @hsuduebc2 - 2 days
I wonder. Maybe for this would be blockchain actually usefull technology?