Crap Data Everywhere
Gerry McGovern discusses the issue of "crap data," highlighting its environmental impact, organizational inefficiency, and the compromised quality of AI training data due to the accumulation of unnecessary information.
Read original articleGerry McGovern highlights the pervasive issue of "crap data," emphasizing its detrimental impact on the environment and organizational efficiency. He argues that the digital age has led to an explosion of unnecessary data, with trillions of photos and videos being stored, most of which will never be accessed again. McGovern cites statistics showing that a significant portion of organizational data is never utilized, with examples such as Kyndryl deleting 90% of its data after a cleanup and various organizations having vast amounts of web pages that receive little to no traffic. He points out that many organizations lack awareness of their data inventory, with a substantial amount of data stored on servers that management does not even know exists. The rise of cloud storage has exacerbated the problem, as the low cost of storage encourages the accumulation of unnecessary data rather than its management. McGovern warns that this "crap data" is what artificial intelligence is being trained on, raising concerns about the quality and reliability of AI outputs. He calls for a reevaluation of data management practices to mitigate environmental harm and improve organizational effectiveness.
- The digital age has led to an explosion of unnecessary data, harming the environment.
- A significant portion of organizational data is never accessed or utilized.
- Many organizations are unaware of the extent and location of their data.
- Cloud storage has made the accumulation of "crap data" more prevalent.
- The quality of AI training data is compromised by the prevalence of low-quality data.
Related
Why Your Data Stack Won't Last – and How to Build Data Infrastructure That Will
The article highlights challenges in data infrastructure, emphasizing poor design, technical debt, and key person dependency. It advocates for thorough documentation, cross-training, and stakeholder engagement to ensure sustainable systems.
Excess memes and 'reply all' emails are bad for climate, researcher warns
Excessive digital data, especially unused content, contributes to greenhouse gas emissions. 68% of corporate data is never accessed, leading to significant energy consumption, with data centers projected to use 6% of the UK's electricity by 2030.
The race to save our online lives from a digital dark age
Concerns over digital data preservation grow as vast information is created daily, with organizations like the Internet Archive working to save at-risk content and prevent a potential "digital dark age."
The Continued Trajectory of Idiocy in the Tech Industry
The article critiques the tech industry's hype cycles, particularly around AI, which distract from past failures. It calls for accountability and awareness of ethical concerns regarding user consent in technology.
No Data Lasts Forever
The article emphasizes the impermanence of data storage, highlighting historical losses and the fragility of modern methods, raising concerns about future records and the need for preservation efforts.
Android has a memories feature that serves them back up to us on occasion. This is a pattern writ large for huge swaths of data.
Differences in governance or allowable access leads to mass duplication and data rot on anything remotely dynamic.
Related
Why Your Data Stack Won't Last – and How to Build Data Infrastructure That Will
The article highlights challenges in data infrastructure, emphasizing poor design, technical debt, and key person dependency. It advocates for thorough documentation, cross-training, and stakeholder engagement to ensure sustainable systems.
Excess memes and 'reply all' emails are bad for climate, researcher warns
Excessive digital data, especially unused content, contributes to greenhouse gas emissions. 68% of corporate data is never accessed, leading to significant energy consumption, with data centers projected to use 6% of the UK's electricity by 2030.
The race to save our online lives from a digital dark age
Concerns over digital data preservation grow as vast information is created daily, with organizations like the Internet Archive working to save at-risk content and prevent a potential "digital dark age."
The Continued Trajectory of Idiocy in the Tech Industry
The article critiques the tech industry's hype cycles, particularly around AI, which distract from past failures. It calls for accountability and awareness of ethical concerns regarding user consent in technology.
No Data Lasts Forever
The article emphasizes the impermanence of data storage, highlighting historical losses and the fragility of modern methods, raising concerns about future records and the need for preservation efforts.