To preserve their work journalists take archiving into their own hands
As news organizations shut down, journalists are increasingly archiving their work to preserve history. Tools like the Wayback Machine and personal records are vital for safeguarding their contributions.
Read original articleAs news websites increasingly shut down, journalists are taking the initiative to preserve their work and the historical context of their reporting. Many news organizations do not prioritize archiving their content, leading to significant losses when sites go dark. Recent examples include the closure of MTV News and the temporary disappearance of Deadspin's archives. A 2021 report indicated that only 7 out of 24 newsrooms were fully preserving their content. Journalists face personal and professional challenges when their work is lost, prompting them to seek creative solutions for archiving. Some utilize tools like the Wayback Machine, while others maintain meticulous records using platforms like AirTable. Freelance reporter Andrea Gutierrez emphasizes the importance of personal archiving, noting that any outlet could close unexpectedly. Matthew Gault, a former reporter for Vice, recounts how his wife developed a scraper to save his articles as the company faced closure. This highlights the urgency and necessity of self-archiving in the current media landscape. While larger legacy outlets may have better resources for preservation, they too must adapt to evolving technologies and potential threats like ransomware. The responsibility for maintaining a record of journalistic work increasingly falls on individual journalists, who must navigate the complexities of digital content preservation to ensure their contributions are not lost to history.
Related
Paramount Erases Mtv.com Archives, Wipes Music, Culture History After 30+ Years
Paramount deletes 20 years of MTV archives, erasing music and cultural history. Former writers express anger over loss of journalism. Move seen as result of corporate greed, lacking journalism respect.
The End of an Era: MTV News Shuts Down… Or why archives are important
MTV News website shutdown erases 20 years of content, emphasizing the importance of archiving online data. Loss impacts journalists, writers, and researchers, highlighting the need for preserving cultural and historical information.
Anna's Archive Loses .GS Domain Name but Remains Resilient
Anna's Archive faced setbacks with domain changes and legal challenges but remains committed to preserving knowledge. Emphasizing cost-effective archiving, it aims to safeguard humanity's culture and knowledge for the future.
PSA: Internet Archive "glitch" deletes years of user data and accounts
A glitch at the Internet Archive deleted numerous user accounts and data, affecting many users. The organization has not addressed the issue, leading to frustration and concerns about data reliability.
PSA: Internet Archive "glitch" deletes years of user data and accounts
The Internet Archive suffered a significant data loss due to a glitch, deleting numerous user accounts and data, causing frustration among users and raising concerns about the platform's reliability.
- Many users emphasize the importance of personal archiving, sharing experiences of saving their own work to prevent loss.
- Concerns are raised about the reliability of URLs and the need for permanent identifiers for digital documents.
- Some commenters suggest collaboration with established archiving organizations like Archive.org to ensure content remains accessible.
- Legal issues regarding ownership of archived work are discussed, particularly in relation to journalists and their employers.
- There are calls for the creation of new platforms or repositories dedicated to immutable archiving of articles and digital content.
A few years ago, Canada digitized many older television shows, https://news.ycombinator.com/item?id=35716982
With the help of many industry partners, the [Canada Media Fund] CMF team unearthed Canadian gems buried in analog catalogues. Once discovered, we worked to secure permissions and required rights and collaborate with third parties to digitize the works, including an invaluable partnership with Deluxe Canada that covered 40 per cent of the digitization costs. The new, high-quality digital masters were made available to the rights holders and released to the public on the Encore+ YouTube channel in English and French.
In late 2022, the channel deleted the entire Youtube Encore archive of Canadian television, with two weeks notice. A few months later, half of the archive resurfaced on https://archive.org/search?query=creator%3A%22Encore%20%2B%2.... If anyone independently archived the missing Encore videos from Youtube, please mirror them to Archive.org.In post truth internet, proving archives is going to be tough and unless there's some other form of verification it's going to be useless fast for "proving" purposes.
You can think bigger and do this to forge stories about anything you want in any website. Nobody checks authenticity of archive urls and there's several sites already, plus a lot of these services do URL re-writing, so it's hard unless there's some authorative thing.
I've been taking notes and blogging since the early 2000s and coming back so often to find the content that I'd linked to has disappeared.
Archive.org and Archive Team do amazing work, but it's a mistake to put all your archiving eggs in one basket.
[0]: https://vertis.io/2024/01/26/how-singlefile-transformed-my-o...
Prior, in the print era, the standard agreement was they'd have the rights to your story upon publication then after a reasonable amount of time the rights would revert to the author.
If I worked for CorporateMediaNews as a columnist and reporter for 10 years and they decide ot remove all of it. Does not CMN own the work and can (unfortunately) dispose of it if they so wish? I would not have any rights for the work?
Thinking about my own career. I have written a hell of a lot of code and 80% at least are closed source system for various companies. I dont retain any copies of that code.
It would be interesting if I heard that System X I wrote 15 years ago is being shut down, and I would try to obtain the source code in order to preserve it. I have never heard of anyone doing it, but probably in games and such it happens more often.
Those sheets are archival quality and should last for quite sometime, given a proper storage environment.
They can always use those later to have them scanned back in should they lose their master digital files.
If you or someone you know are looking to archive content from the Web, but don't know how, I'll be happy to help. My email is in my profile.
Yay.
It feels a lot like the end of usenet or geocities, but this time without the incentive for the archivists to share their collections as openly. I am certain full scrapes of reddit and twitter exist, even with post API closure changes, but we will likely never see these leave large AI companies internal data holdings.
I have taken it upon myself to begin using the updated zimmit docker container to start archiving swaths of the 'useful web', meaning not just high quality language tokens, but high quality citations and knowledge built with sources that are not just links to other places online.
I started saving all my starred github repos into a folder and it came out just around 125gb of code.
I am terrified that in the very short term future a lot of this content will either become paywalled or the financial incentives of hosting large information repositories will increase past the point of current ad revenue based models as more powerful larger scraping operations seek to fill their petabytes while i try to prevent my few small TB of content i dont want to lose from slipping through my fingers.
If anyone actually cares deeply about content preservation, go and buy yourself a few 10+ TB external disks and grab a copy of zimmit and start pulling stuff. Put it on archive.org and tag it. So far the only zim files I see on archive.org are the ones publicly released by the kiwix team yet there is an entire wiki of wikis called wikiindex that remains almost completely unscraped. Fandom and Wikia are gigantic repositories of information and I fear they will close themselves up sooner than later, while many of the smaller info stores we have all come to take for granted as being "at our fingertips" will slowly slip away.
I first noticed the deep web deepening when things I used to be able to find on google were no longer showing up no matter how well I knew the content I was searching for, no matter the complex dorking i attempted using operators in the search bar, just like it had vanished. For a time bing was excellent at finding these "scrubbed" sites. Then duckduckgo entered the chat, and bing started to close itself down more. Bing was just a scrape of google, and google stopped being reliable, so downstream "search indexers" just became micro googles that were slightly out of date with slightly worse search accuracy, but those ghost pages were now being "anti-propagated" into these downstream indexers.
Yandex became and is still my preferred search engine when I actually need to find something online, especially when using operators to narrow wide pools.
I have found some rough edges with zimmit and I am planning on investigating and even submitting some PR upstream, but when an archive attempt takes 3 days to run before crashing and then wiping out the progress it has been hard to debug without the FOMO hitting that I should be spending the time getting what I can now before coming back to work on the code and get everything properly.
If any have the time to commit to the project and help make it more stable, perhaps work on some more fault recovery or failure continuation it would make archivists like me who are strapped for time very very happy.
Please go and make a dent in this, news is not the only part of the web i feel could be lost forever if we do not act to preserve it.
In 5 years time I see generic web searches being considered a legacy software and eventually decommissioned in favor of AI native conversational search (blow my brains out). I know for a fact all AI companies are doing massive data collection and structuring for graphrag style operations, my fear is that when its working well enough search will just vanish until a group of hobbyists make it available to us again.
Related
Paramount Erases Mtv.com Archives, Wipes Music, Culture History After 30+ Years
Paramount deletes 20 years of MTV archives, erasing music and cultural history. Former writers express anger over loss of journalism. Move seen as result of corporate greed, lacking journalism respect.
The End of an Era: MTV News Shuts Down… Or why archives are important
MTV News website shutdown erases 20 years of content, emphasizing the importance of archiving online data. Loss impacts journalists, writers, and researchers, highlighting the need for preserving cultural and historical information.
Anna's Archive Loses .GS Domain Name but Remains Resilient
Anna's Archive faced setbacks with domain changes and legal challenges but remains committed to preserving knowledge. Emphasizing cost-effective archiving, it aims to safeguard humanity's culture and knowledge for the future.
PSA: Internet Archive "glitch" deletes years of user data and accounts
A glitch at the Internet Archive deleted numerous user accounts and data, affecting many users. The organization has not addressed the issue, leading to frustration and concerns about data reliability.
PSA: Internet Archive "glitch" deletes years of user data and accounts
The Internet Archive suffered a significant data loss due to a glitch, deleting numerous user accounts and data, causing frustration among users and raising concerns about the platform's reliability.