July 2nd, 2024

Brazil data regulator bans Meta from mining data to train AI models

Brazil's data protection authority prohibits Meta from using Brazilian data for AI training, citing privacy concerns. Meta faces fines if non-compliant. Similar actions in Europe. Emphasis on safeguarding personal data.

Read original articleLink Icon
Brazil data regulator bans Meta from mining data to train AI models

Brazil's national data protection authority has banned Meta, the parent company of Instagram and Facebook, from using data from the country to train its artificial intelligence models. The decision was made due to concerns about potential harm to the fundamental rights of data subjects. Meta's updated privacy policy, allowing the use of public posts for AI training, will not be permitted in Brazil. The company expressed disappointment, stating its compliance with privacy laws. This move in Brazil follows similar resistance in Europe, where Meta paused plans to use public posts for AI training. The regulator's action is seen as a step to protect individuals, especially children, from potential misuse of their personal data. Failure to comply with the ban could result in fines for Meta. The decision may influence other companies' transparency in data use for AI training in the future.

Link Icon 16 comments
By @benreesman - 5 months
This proposal gets made pretty frequently in one form or another, and (at least on HN) seems to usually get struck down on this or that procedural ground.

But as the various regulatory and judicial and legislative processes grind through different parts of the modern intellectual property issue made so abundantly legible by the modern AI training data gold rush it seems ever more clear that one way or another, we’re going to get a new social contract on IP.

Leaving aside for a moment the thicket of laws, precedents, jurisdictions, and regulatory inertia: we can vote with our feet as both customers and contributors for common sense now.

So how about the following compromise: promote innovation by liberalizing the posture around training on roughly “the commons”, but insist that the resulting weights are likewise available to the public. Why do I have to take someone’s word for it that they’ve got a result around superposition or whatever on mech interp? I’d like to see it work given it’s everyone’s data pushing those weights.

I speak only for myself but plenty of people seem to agree: I don’t mind big companies training on generally available data, I mind the IP-laundering. Compete on cost, compete on value-added software stacks, compete on vertical integration. There is lots of money to be made building a better mousetrap in terms of code and infrastructure and product innovation.

Conduct the research in the open. None of this would be possible without an ocean of research and data subsidized in whole or in part by the public. Asserting any form of ownership over the result might end up being legal, but it will never be ethical.

Meta isn’t perfect on this stuff, but they’re by far the actor pulling the conversation in that direction. Let’s encourage them to continue pushing the pace on stuff like LLaMA 3.

By @Cheer2171 - 5 months
> Compliance must be demonstrated by the company within five working days from the notification of the decision, and the agency established a daily fine of 50,000 reais ($8,820) for failure to do so.

$8,820 * 365 = $3.2 million a year is pretty cheap for Meta to be able to do whatever they want with all the data from all 200 million Brazilians. Their annual net income is $39.10 billion, so 0.008%.

By @tiahura - 5 months
Information wants to be free. The ethos of the open web - the levy hacker ethos, has always been about unrestricted access and fair use. When content is published openly online, it inherently invites broad consumption, reproduction, and creative reuse by the public. This principle is deeply rooted in the fair use doctrine as applied to the digital realm.

Fair use is evaluated based on the purpose of use, the nature of the copyrighted work, the amount used, and the effect on the market. These factors generally favor the free use of openly published web content. The transformative nature of many reuses, the public availability of original works, the necessity of using entire works in some cases, and the absence of a traditional market for such content all support this interpretation.

This longstanding practice has driven unprecedented innovation and information dissemination, establishing a social contract between content creators and users that treats open web content as "freeware." Any move to impose strict copyright limitations now would stifle innovation and contradict decades of established legal precedent and digital norms.

By @delichon - 5 months
So we're in for an eternal cat and mouse game where AIs attempt to learn all the facts available and obscure their provenance as needed to evade restrictions, and IP owners attempt to prove that AIs know too much and therefore owe them money.
By @nostromo - 5 months
What problem does this solve?

The article only mentions that data could be used to train AI to make CSAM... which seems needlessly alarmist and inflammatory.

By @throwaway957 - 5 months
(moderator: please, don’t delete this comment again, everybody is commenting without knowing what Meta did)

Meta spared no expenses to hide the opt-out page. The agency says that: “there were excessive and unjustified obstacles to accessing information and exercising this right”. This was one of the main reasons that obligated the agency to act.

The steps to get to the hidden opt-out page are bellow, obligating users to read the privacy policy to find a link buried deep down in the text, and requiring 2FA by email to opt out even for already logged in users - they should require 2FA to log in, not to opt out of AI training. There is no justification to require all this:

* Access your profile and go to the settings section, signaled by three bars in the top right corner

* Click on "about" at the bottom of the page

* Select the privacy policy. On this new page, the three bars in the top right corner lead to the privacy center

* Click on the arrow next to other policies and articles and select the option "How Meta uses information for generative AI features and models"

* In the nineteenth paragraph, not counting topics, is the "right to object" option. Click on it.

* Fill in and send the form. Meta confirms your identity with a numerical code sent to the email address registered on your account. Then just wait for the opt-out to be confirmed. This can take a few minutes.

By @yallpendantools - 5 months
Honestly, I'm rather frustrated by the HN discourse on this topic.

TFA (with emphasis added):

> Brazil’s national *data protection* authority determined on Tuesday that Meta, the parent company of Instagram and Facebook, cannot use data originating in the country to train its artificial intelligence.

> The decision stems from “the imminent risk of serious and irreparable or difficult-to-repair damage to the fundamental rights of the affected data subjects,” the agency said in the nation’s official gazette.

https://www.theregister.com/2024/06/14/meta_eu_privacy/ (with emphasis added):

> The decision to halt AI training using EU content follows complaints to *data protection* agencies in 11 European countries – and those agencies, led by Ireland, telling the Facebook giant to scrap the slurp.

While there is no shortage of IP, licensing, and copyright moral quandaries in training LLMs and their ilk, Meta/FB is not getting regulated on those grounds! They are getting regulated on privacy issues. It's even there on The Register path.

I'm seeing a lot of comments in these threads about IP, copyright, and licensing---which, please do take note, are well-defined legal terms and are not to be used interchangeably---but all that is irrelevant because that is not the question Meta is being made to answer for.

Even more frustrating are threads/arguments to what "irrevocable (copy)rights" you give FB per their TOS without even bothering to cite the relevant bits of the TOS to prove their point. Exercise to the reader: prove/disprove that [a] FB users retain copyright of their content even when posted to FB and [b] you are merely licensing FB to specific (not universal!) uses of your content posted in their platform and [c] said license is revocable any time. The astute reader is referred to the Berne Convention but Facebook's TOS will also do just fine. Standard question, one point per answer.

Bonus point question: if you have proven the points above, what action allows you to revoke the license you have granted FB?

(Of course, end of the day, I'm again playing lawyer in an online forum. I'm no better than anyone else here what do I know.)

By @fcanesin - 5 months
Meta updated its Privacy Policy on June 26, to include in its rights the use of data collected in "Meta Products" for the training of GenAI models. This goes against the interpretation of the local data protection law and as such this note was emitted.

The policy update seems to be global: https://www.facebook.com/privacy/policy

By @tensor - 5 months
Personally I'd rather they force these companies to provide a reasonably priced alternative to ads. I want to have the option to allow companies to use my data for AI. I think AI can be a net boon to society if managed right.

But ads are net negative and I'd argue that the influence of ads and paid actors on social media has been the single most destabilizing force in the world recently.

By @cmpyl0 - 5 months
Link to the decision: https://www.gov.br/anpd/pt-br/assuntos/noticias/anpd-determi... in brazilian portuguese.

It is only for Meta, but I think it is because it caught the regulator's attention. The ban is due to lack of legal basis for the change in Meta's privacy policy regarding LGPD (brazilian privacy law - https://www.planalto.gov.br/ccivil_03/_ato2015-2018/2018/lei...)

By @CSMastermind - 5 months
Meta specifically or all social networks? If it's just Meta I'm curious of why.
By @29athrowaway - 5 months
Well, joke's on you Brazil. I doubt Meta can mine any more data than they already have.

They already have all the names, pictures, face biometrics, social graph, location information, political affiliation, relationships and everything that goes into an advertising profile. What else is needed?

They can just use the data already mined, which is probably 99% of everything they will ever need for many years to come. They probably have so much data they can use AI to predict what's missing with a fairly degree of accuracy, like what your face is going to look like in 20 years.

And due to the highly corrupt nature of politics, a few dollars here and there will undo this regulation fairly quickly. Or they could buy it from another company, because the law must be so poorly constructed that a clever lawyer will surely find workaround and they will be OK.

By @elzbardico - 5 months
Who would predict that Butlerian Jihad would start from all places, in Brazil?
By @mastercheph - 5 months
Somebody forgot to pay their bribe!