Digital Archivists: Protecting Public Data from Erasure
Digital archivists preserve essential public datasets for research, with Harvard's Library Innovation Lab archiving Data.gov. The Internet Archive aids access, countering risks from historical data removals during political transitions.
Read original articleDigital archivists play a crucial role in preserving public data, which is essential for scientific research and the integrity of the scholarly record. The Library Innovation Lab at Harvard Law School has developed an extensive archive of Data.gov, containing over 311,000 public datasets, to safeguard against the potential erasure of vital information. The Internet Archive's Wayback Machine has also been instrumental in maintaining access to government websites and datasets, which can be affected by political transitions. Historical instances of data removal, such as during the George W. Bush administration and more recently under the Trump administration, highlight the risks associated with the loss of public data. These deletions can undermine years of research and invalidate scientific models. The efforts of digital archivists ensure that knowledge remains accessible, allowing future innovations to build on past discoveries. Their work is vital in maintaining a continuous record of human knowledge, particularly in fields like science, engineering, and medicine, where data integrity is paramount.
- Digital archivists are essential for preserving public datasets critical for research.
- The Library Innovation Lab at Harvard Law School has created a significant archive of Data.gov.
- Historical data removals during political transitions pose risks to scientific integrity.
- The Internet Archive's Wayback Machine helps maintain access to important government data.
- Continuous archiving efforts ensure that knowledge remains accessible for future innovations.
Related
We're losing our digital history. Can the Internet Archive save it?
The Internet Archive has preserved 866 billion web pages, but faces financial instability and legal challenges. Its Wayback Machine is crucial for accessing historical content, despite ongoing risks to its operations.
We're losing our digital history. Can the Internet Archive save it?
The Internet Archive has preserved 866 billion web pages, but faces financial instability and legal challenges that threaten its operations, despite its vital role in digital history preservation through the Wayback Machine.
Inside the race to archive the US government's websites
The new U.S. administration's removal of thousands of government web pages has prompted organizations to archive critical public health and environmental data, raising concerns about future research and informed decision-making.
Federal data is disappearing. On Thursday, meet the teams working to rescue it
Federal datasets are increasingly at risk, prompting civil society organizations to preserve information. MuckRock is hosting an event on February 13 to discuss data preservation efforts and collaboration.
Humming along in an old church, the Internet Archive is more relevant than
The Internet Archive has cataloged 73,000 deleted U.S. government web pages since Trump's inauguration, processing 100 terabytes of data daily and emphasizing the importance of preserving digital history and public participation.
Internet Archive et al. made noise and promises but told volunteers to stop because they couldn't actually handle the ingest.
https://www.reddit.com/r/Archiveteam/comments/1jbgycm/us_gov...
These folks made a notable effort.
https://webrecorder.net/blog/2025-03-25-govarchive-us-and-mi...
I hate everything about this.
This stuff is very important to talk about so I hope that this submission by rbanffy isn't also flagged.
Once you format shift, you will always be format shifting.
Keep your originals whenever you can.
Let’s say an individual posted identifying or incriminating information online, inadvertently or intentionally, in a public place.
Then a third party decides to store it, and possibly make it accessible to others.
If the original self doxxing user then pulled the original dox, but was unable to scrub the rest, would that information still be considered public, or would it be private? Was it ever truly public? Or private for that matter?
Related
We're losing our digital history. Can the Internet Archive save it?
The Internet Archive has preserved 866 billion web pages, but faces financial instability and legal challenges. Its Wayback Machine is crucial for accessing historical content, despite ongoing risks to its operations.
We're losing our digital history. Can the Internet Archive save it?
The Internet Archive has preserved 866 billion web pages, but faces financial instability and legal challenges that threaten its operations, despite its vital role in digital history preservation through the Wayback Machine.
Inside the race to archive the US government's websites
The new U.S. administration's removal of thousands of government web pages has prompted organizations to archive critical public health and environmental data, raising concerns about future research and informed decision-making.
Federal data is disappearing. On Thursday, meet the teams working to rescue it
Federal datasets are increasingly at risk, prompting civil society organizations to preserve information. MuckRock is hosting an event on February 13 to discuss data preservation efforts and collaboration.
Humming along in an old church, the Internet Archive is more relevant than
The Internet Archive has cataloged 73,000 deleted U.S. government web pages since Trump's inauguration, processing 100 terabytes of data daily and emphasizing the importance of preserving digital history and public participation.