September 13th, 2024

Meta fed its AI on everything adults have publicly posted since 2007

Meta has acknowledged using public posts from adult users on Facebook and Instagram for AI training since 2007, raising privacy concerns and highlighting the need for stronger regulations.

Read original articleLink Icon
ConcernSkepticismAmusement
Meta fed its AI on everything adults have publicly posted since 2007

Meta has confirmed that it has utilized nearly all publicly posted content from adult users on Facebook and Instagram since 2007 to train its artificial intelligence models. This acknowledgment came during a local government inquiry in Australia, where Meta's global privacy director, Melinda Claybaugh, initially denied the claims but later admitted the practice. Unless users have actively set their posts to private, Meta has collected all public posts and comments for AI training purposes. The company has been vague about the specifics of its data collection timeline and practices. While European users can opt out due to local privacy laws, users in other regions, including Australia, currently do not have this option. Claybaugh stated that Meta does not scrape data from users under 18, but confirmed that public posts from adult accounts created when the user was a minor could still be collected. The inquiry highlighted concerns about the exploitation of personal data, particularly regarding children’s images, and emphasized the need for stronger privacy regulations to protect users from such practices.

- Meta has used public posts from Facebook and Instagram users since 2007 for AI training.

- Users must set their posts to private to prevent data scraping; otherwise, their content is collected.

- European users can opt out of data collection, but users in other regions cannot.

- Concerns were raised about the collection of data from accounts created by minors.

- The inquiry underscored the need for improved privacy regulations to protect user data.

AI: What people are saying
The discussion surrounding Meta's use of public posts for AI training reveals several key themes and opinions.
  • Many commenters express a lack of surprise, suggesting that using public data is expected behavior from Meta.
  • Concerns about privacy and the implications of using user-generated content for AI training are prevalent.
  • Some users argue that if content is publicly posted, it is fair game for use, while others question the ethics of this practice.
  • There are mixed feelings about the quality and intelligence of AI trained on social media data.
  • Several comments highlight the need for clearer regulations and user awareness regarding data usage on platforms like Facebook.
Link Icon 46 comments
By @shsbdncudx - 4 months
Presumably “scraped” isnt the right term here. They already have the raw data, they Won’t be “scraping “ it from the website they’ll just be investing it from where they store it
By @parasti - 4 months
It's funny because the entire Facebook ecosystem is designed to disincentivize meaningful posting. Just keep watching the ads and short form videos, user.
By @encoderer - 4 months
That’s nothing. AOL has just finished training on 29 years of emails and messages. it’s hoped that with more H100s the AI will finally be able to calculate the full amount due by BillG for the emails mom has been forwarding.
By @Noumenon72 - 4 months
Was "public" ever the default setting? I remember it as being opt-in if you ever wanted something to show beyond your friends-of-friends.
By @mattcantstop - 4 months
I am very likely in the minority here, but I think AI SHOULD be trained on everything that is in the public sphere. I'd be disappointed if it wasn't trained on everything they had access to.

If it is trained on private information, then I would have issue with it.

By @gnabgib - 4 months
Discussion (81 points, 3 days ago, 79 comments) https://news.ycombinator.com/item?id=41508158
By @tdeck - 4 months
Can we talk about how most of us haven't read 80% of everything on the internet and yet we are all still better at many basic things than these AIs? At what point do we admit to ourselves that this isn't a sustainable path forward.
By @fidla - 4 months
Well they don't really know if someone is an adult or not. Just because they say they are 13 doesn't mean that they really were when they signed up. And 13 is hardly an adult now is it?
By @kylehotchkiss - 4 months
It's OK. Meta is training their AI on hundreds of thousands of posts with photos of veterans with toilet plunger legs celebrating their birthdays in the middle of the street while sitting as sturdy as the Lincoln memorial. The AI brain rot has already begun in this model.
By @orochimaaru - 4 months
Why is this surprising? They’ve always done this. In fact I’d be surprised if they didn’t do this. Fwiw - llama is free to use. So I guess it’s a good enough return.

I don’t use Facebook. I’m not sure if they can peek into WhatsApp messages.

By @ChrisArchitect - 4 months
By @paxys - 4 months
So did OpenAI and Anthropic and Google. That's what "public" means.
By @koolala - 4 months
Skynet Ads are "said" to be preferred. "People prefer to see relevant ads." Can AI understand humans better than humans understand themselves? Can Humans understand the consciousness of Dogs and Cats better than they do?

The objective answer feels like No but the subjective answer feels like Yes. Humans will never understand how an animal truely thinks but we understand how to control them.

By @autoexec - 4 months
I don't believe for a moment that they haven't used the data of countless children. Especially early on when kids just had to click an "I'm over 18" button or enter a fake birthday to get accounts and facebook, like everyone else, just looked the other way.
By @duxup - 4 months
We're going to create a really bad AI and get upset by that fact only to discover that ... we all made it that way.

https://www.youtube.com/watch?v=Y-Elr5K2Vuo

By @AlexandrB - 4 months

    People just submitted it.
    I don't know why.
    They 'trust me'.
    Dumb fucks.
-Mark Zuckerberg

Things change, but this never stop being a concise summary of Meta's ethos as a company.

By @geertj - 4 months
I imagine a future AI trained on this going into therapy to uncover childhood trauma.
By @greesil - 4 months
I am the product.
By @not2b - 4 months
This would include all those celebrity posts on Instagram. Great for deepfakes. They'll try to protect against that, but a bit of cleverness with prompts should be able to get around the filters.
By @PaulHoule - 4 months
Assuming they want to build a model that can do useful things with their own data (say any kind of content filtering, summarization, etc.) it is exactly what they should do.
By @whoitwas - 4 months
I don't understand how this surprises anyone. You choose to give them your data. It's not free. If you don't want them to have your data, don't give it away.
By @aplusbi - 4 months
Honestly this feels like a better policy than most AI training - Meta actually has explicit rights to the content it is using. Sure it was EULA click-through but at least it's something that the content creator ostensibly agreed to.

Of course I'm sure Meta is also training their AI on content that they scraped from the internet/other sources without permission...

By @MisterBastahrd - 4 months
Meta just created the dumbest object known to mankind. Quite an achievement given our current political landscape.
By @golergka - 4 months
If it's publicly posted, it literally means that everybody can read it. What's exactly the issue here?
By @nottorp - 4 months
So facebook's "AI" will speak australian slang instead of nigerian business english?
By @WuxiFingerHold - 4 months
Meta can and will use every WhatsApp, Facebook or Insta post of every user of the planet, if they think they can benefit from it. They don't care about any data protection laws or ridiculous low fines anyway. Meta is the most evil and powerful company on the planet. Believing anything else is naive. No news here.
By @ilrwbwrkhv - 4 months
Serves them right. Anyone who puts up their images on Facebook willingly deserves to be subjugated.
By @Cyclone_ - 4 months
Aren't most AIs trained on puic data, i.e. this doesn't seem terribly surprising?
By @ado__dev - 4 months
Not surprised at all. Facebook owns the platform and outlined in the ToU that they can do whatever they want with the content you post on there.

At least it's better than scraping content off platform (which I'm sure they've done) and using that, but using content posted on their own platform seems like a no-brainer.

By @nkmnz - 4 months
Is this true for posts from people with deactivated/deleted accounts as well?
By @musicale - 4 months
I guess that's why I'm getting recommendations for Tim Tams.
By @almost_usual - 4 months
Can’t wait to see the memes it generates.
By @dboreham - 4 months
Journalists discover how AI works...
By @annoyingnoob - 4 months
Garbage in, garbage out.
By @CamperBob2 - 4 months
If the service is free, you're the product. Here's a radical idea: if you don't want Facebook to use your information and content, don't post it to Facebook.

... or does everyone around here think anything different is happening to their posts?

By @gmd63 - 4 months
"They just trust me...Dumb f**s" - Mark Zuckerberg
By @mylons - 4 months
how is _anyone_ surprised by this?
By @jewelry - 4 months
Why is this even a news? Google scrape all public posts to build search index… Bunch of 3rd party vendors scraped all public post to build the ads price model…
By @jppope - 4 months
I for one am shocked. Shocked I say. There are dozens of us surprised by Facebook's actions... DOZENS.
By @globalnode - 4 months
how is this even news? people getting outraged that data put in the public domain gets used buy someone... what world am I living in here?
By @SoftTalker - 4 months
Funny to think that the distillation of 16 years of Facebook posts is now considered "intelligence."
By @pbhjpbhj - 4 months
In the UK I'd say they've definitely committed copyright infringement. Fair Dealing doesn't allow this.
By @askafriend - 4 months
This isn't really that groundbreaking of a story...

Of course they'd do this! How did people think feed ranking worked?

The only reason this is being reported now is because there's a chatbot and I guess that feels different to people.