July 17th, 2024

YouTube creators surprised to find Apple and others trained AI on their videos

YouTube creators express surprise as tech giants Apple, Salesforce, and Anthropic train AI models on YouTube videos without consent. Dataset "the Pile" by EleutherAI includes content from popular creators and media brands. Ethical concerns arise.

Read original article

YouTube creators surprised to find Apple and others trained AI on their videos

YouTube creators were surprised to discover that major tech companies like Apple, Salesforce, and Anthropic had trained their AI models on tens of thousands of YouTube videos without the creators' consent. The companies utilized a dataset called "the Pile," created by EleutherAI, which includes YouTube captions scraped from over 48,000 channels, including videos from popular YouTubers like MrBeast and PewDiePie. The dataset also contained content from mainstream media brands like Ars Technica. While some creators expressed frustration at the unauthorized use of their content, companies like Anthropic defended their actions, stating that the dataset used was a small subset of YouTube subtitles and did not directly violate YouTube's terms of service. This incident sheds light on the challenges creators face in controlling how their content is used online and raises questions about the ethical implications of training AI models on publicly available data.

OpenAI and Anthropic are ignoring robots.txt

Two AI startups, OpenAI and Anthropic, are reported to be disregarding robots.txt rules, allowing them to scrape web content despite claiming to respect such regulations. TollBit analytics revealed this behavior, raising concerns about data misuse.

YouTube in talks with record labels over AI music deal

YouTube is in talks with major record labels to license AI tools replicating artists' music. Some artists are wary of devaluation concerns. Negotiations aim to involve select artists for AI music generation.

Microsoft CEO of AI Your online content is 'freeware' fodder for training models

Mustafa Suleyman, CEO of Microsoft AI, faced legal action for using online content as "freeware" to train neural networks. The debate raises concerns about copyright, AI training, and intellectual property rights.

YouTube lets you request removal of AI content that simulates your face or voice

YouTube's new policy allows users to request removal of AI-generated content mimicking their face or voice to address privacy concerns. Requests are assessed based on disclosure, identification, public interest, and sensitive behaviors. Content uploaders have 48 hours to respond to complaints.

Apple trained AI models on YouTube content without consent

Tech giants, like Apple, used YouTube video subtitles without creators' consent for AI training. Concerns over legality and ethics arise as companies leverage third-party datasets, impacting creators and raising AI training ethics issues.

4 comments

By @verdverm - 9 months

Are we really surprised, given that youtube-dl and friends exists? I've seen multiple AI demos that use this tool

By @ChrisArchitect - 9 months

[dupe]

Source: https://www.wired.com/story/youtube-training-data-apple-nvid...

Some more discussion: https://news.ycombinator.com/item?id=40977465

By @middlefing - 9 months

AI: Don't forget to like and smash that subscribe button <AD for useless meme company>

YouTube creators surprised to find Apple and others trained AI on their videos

Related

OpenAI and Anthropic are ignoring robots.txt

YouTube in talks with record labels over AI music deal

Microsoft CEO of AI Your online content is 'freeware' fodder for training models

YouTube lets you request removal of AI content that simulates your face or voice

Apple trained AI models on YouTube content without consent

Related

OpenAI and Anthropic are ignoring robots.txt

YouTube in talks with record labels over AI music deal

Microsoft CEO of AI Your online content is 'freeware' fodder for training models

YouTube lets you request removal of AI content that simulates your face or voice

Apple trained AI models on YouTube content without consent