July 15th, 2024

Google's Gemini AI caught scanning Google Drive PDF files without permission

Google's Gemini AI scans Google Drive PDFs without consent, sparking privacy concerns. Users struggle to disable this feature, raising questions about user control and data privacy within AI services.

Read original article

Google's Gemini AI caught scanning Google Drive PDF files without permission

Google's Gemini AI service has been reported to scan Google Drive hosted PDF files without user permission, prompting concerns about privacy and control over sensitive information. Kevin Bankston, a Senior Advisor on AI Governance, raised the issue on Twitter after discovering Gemini summarizing his private documents without consent. Despite efforts to disable the feature, users like Bankston found it challenging to locate the necessary settings. The problem appears to be linked to Google Drive and potentially affects Google Docs as well. While Google's Gemini AI claims to have accessible privacy settings, users like Bankston encountered difficulties in managing the automatic scanning of their files. The issue may be related to Google Workspace Labs settings overriding intended configurations. This incident highlights the importance of user consent and privacy protection, especially concerning sensitive data. Google's handling of this situation has raised questions about the transparency and control users have over their information within AI-driven services.

Gemini's data-analyzing abilities aren't as good as Google claims

Google's Gemini 1.5 Pro and 1.5 Flash AI models face scrutiny for poor data analysis performance, struggling with large datasets and complex tasks. Research questions Google's marketing claims, highlighting the need for improved model evaluation.

Google Researchers Publish Paper About How AI Is Ruining the Internet

Google researchers warn about generative AI's negative impact on the internet, creating fake content blurring authenticity. Misuse includes manipulating human likeness, falsifying evidence, and influencing public opinion for profit. AI integration raises concerns.

Google's Nonconsensual Explicit Images Problem Is Getting Worse

Google is struggling with the rise of nonconsensual explicit image sharing online. Despite some efforts to help victims remove content, advocates push for stronger measures to protect privacy, citing the company's capability based on actions against child sexual abuse material.

Is your data safe from Google Docs AI scraping?

Google Docs faces scrutiny for potential data usage in AI training. Proton Drive offers encrypted Docs for enhanced privacy, contrasting Google's practices. Users must weigh privacy concerns when choosing between the two.

Google Gemini scans files on Google Drive without permission – can't be disabled

Google's Gemini AI scans Google Drive PDFs without consent, sparking privacy concerns. Users struggle to disable scanning, possibly linked to Google Workspace Labs. Lack of control raises privacy and data security issues.

32 comments

By @Cthulhu_ - 10 months

Just reiterates that you don't own your data hosted on cloud providers; this time there's a clear sign, but I can guarantee that google's systems read and aggregated data inside your private docs ages ago.

This concern was first raised when Gmail started, 20 years ago now; at the time people reeled at the idea of "google reads your emails to give you ads", but at the same time the 1 GB inbox and fresh UI was a compelling argument.

I think they learned from it, and google drive and co were less "scary" or less overt with scanning the stuff you have in it, also because they wanted to get that sweet corporate money.

By @vouaobrasil - 10 months

All AI should be opt-in, which includes both training and scanning. You should have to check a box that says "I would like to use AI features", and the accompanying text should be crystal clear what that means.

This should be mandatory, enforced, and come with strict fines for companies that do not comply.

By @nitin_flanker - 10 months

Apart from the obvious misleading way this article is written. I am listing all the links shared in the tweet thread that the article mentioned -

- Manage your activity on Gemini : https://myactivity.google.com/product/gemini

- This page has most answers related to Google Workspace and opting out of different Google apps : https://support.google.com/docs/answer/13447104#:~:text=Turn...

By @shadowgovt - 10 months

The headline is a little unclear on the issue here.

It is not surprising that Gemini will summarize a document if you ask it to. "Scanning" is doing heavy lifting here; The headline implies Google is training Gemini on private documents, when the real issue is Gemini was run with a private document as input to do a summary when the user thought they had explicitly switched that off.

That having been said, it's a meaningful bug in Google's infrastructure that the setting is not being respected and the kind of thing that should make a person check their exit strategy if they are completely against using The new generation of AI in general.

By @thenoblesunfish - 10 months

The title is misleading, isn't it? I was expecting this was scanning for training or testing or something, but this is summarization of articles the user is looking at, so "caught" is disingenous. You don't "catch" people doing things they tell you they are doing, while they're doing it.

By @Havoc - 10 months

Only a matter of time before someone extracts something valuable out of googe's models. Bank passwords or crypto keys or something

Glue pizza incident illustrated they're just yolo'ing this

By @motohagiography - 10 months

this is similar to the scramble for health data during covid where a number of groups tried (and some succeeded) at using the crisis to squeeze the toothpaste out of the tube in a similar way, as there are low costs to being reprimanded and high value in grabbing the data. bureaucratic smash-and-grabs, essentially. disappointing, but predictable to anyone who has worked in privacy, and most people just make a show of acting surprised then moving on because their careers depend on their ability to sustain a gallopingly absurd best-intentions narrative.

your hacked SMS messages from AT&T are probably next, and everyone will be just as surprised when keystrokes from your phones get hit, or there is a collection agent for model training (privacy enhanced for your pleasure, surely) added as an OS update to commercial platforms.

Make an example of the product managers and engineers behind this, or see it done worse and at a larger scale next time.

By @Aurornis - 10 months

The original Tweet and this article are mixing terms in a deliberately misleading way.

They’re trying to suggest that exposing an LLM to a document in any way is equivalent to including that document in the LLM’s training set. That’s the hook in the article and the original Tweet, but the Tweet thread eventually acknowledges the differences and pivots to being angry about the existence of the AI feature at all.

There isn’t anything of substance to this story other than a Twitter user writing a rage-bait thread about being angry about an AI popup, while trying to spin it as something much more sinister.

By @okdood64 - 10 months

I'm shocked, especially this being HN, that how many people are being successfully misled on what is actually going on here. Do people still read articles before posting?

By @nerdjon - 10 months

Shocker, Google not going quite far enough with privacy and data access? They talk about it but its never quite far enough to avoid their own services accessing data.

We really need to get to the point that all data remotely stored needs to be encrypted and unable to be decrypted by the servers, only our devices. Otherwise we just allow the companies to mine the data as much as they want and we have zero insight into what they are doing.

Yes this requires the trust that they in fact cannot decrypt it. I don't have a good solution to that.

Any AI access to personal data needs to be done on device, or if it requires server processing (which is hopefully only a short term issue) a clear prompt about data being sent out of your device.

It doesn't matter if this isnt specifically being used to train the model at this point in time, it is not unreasonable to think that any data sent through Gemini (or any remote server) could be logged and later used for additional training, sitting plaintext in a log, or just viewable by testers.

By @r2vcap - 10 months

There is no cloud. It's just someone else's computer.

By @api - 10 months

Not stored on your device or encrypted with keys only you control, not yours.

I assume anything stored in such a system will be data mined for many purposes. That includes all of Gmail and Google Docs.

By @worksonmine - 10 months

This shouldn't come as a surprise to anyone, their entire business is our data. I always encrypt anything important I want to backup on the cloud.

By @shinycode - 10 months

I tried an equivalent of copilot once, in the code base I typed `images = [` and the AI autofilled me the array with http links to real images. I never tried to do the same thing with private keys or other sensitive informations but it sucks that it happens

By @estebarb - 10 months

It is urgent to educate people about how these systems work. Search requires indexing. Summarizing with a LM requires inference. Data used for inference usually is forgotten forever after used, as it is not used for training.

Yeah, that should be obvious for many here, but even software engineers believe that AI are sentient things that will remember everything that they see. And that is a problem, because public is afraid of the tech due to a wrong understanding of how it works. Eventually they will demand laws protecting them from stuff that have never existed.

Yes, there are social issues with AI. But the article just shows a big tech illiteracy gap.

By @silvaring - 10 months

I just want to add that gmail has a very sneaky 'add to drive' button that is way too easy to click when working with email attachments.

How long til gmail attachments get uploaded into drive by default through some obscure update that toggles everything to 'yes'?

By @meindnoch - 10 months

Your first mistake was storing your data on someone else's computer.

By @eagerpace - 10 months

In the push for AGI do companies feel a recursive learning future is soon achievable and therefore getting to the first cycle of that is worth the cost of any legal issues that may arise?

By @Khaine - 10 months

If this is true, google needs to be charged with the violation of various privacy laws.

I’m not sure how they can claim they have informed consent for this from their customers

By @padolsey - 10 months

There is a fundamentally interesting nuance to highlight. I don't know precisely what google is doing, but if they're just shuttling the content through a closed-loop deterministic LLM, then, much like a spellchecker, I see no issue. Sure, it _feels_ creepy, but it's just an algo.

Perhaps someone can articulate the precise threshold of 'access' they wish to deny apps that we overtly use? And how would that threshold be defined?

"Do not run my content through anything more complicated than some arbitrary [complexity metric]" ??

By @space_oddity - 10 months

The inability to disable this feature adds to the frustration

By @huggingmouth - 10 months

My wake up moment with google was when they accused a parent of being a pedophile, permanently banned their accounts, reported them the police, and then doubled down when they were proven wrong.

Not only due those degenerates have the gal to creep on people, they refuse to admit wrongdoing or make their victems whole.

Sickos. That's what they are. Sickos.

By @muscomposter - 10 months

we should just embrace digital copying galore instead of trying to digitalize the physical constraints of regular assets

we should just ignore physical constraints of assets which do not have them, like any and all digital data

which do you prefer? everybody can access all digital data of everybody (read only mode), or what we have now which is trending towards having so many microtransactions that every keystroke gets reflected in my bank account

By @acar_rag - 10 months

The title misleads the point, and the article is, imoo, badly written. The post implies there is indeed a setting to turn it off. So the author deliberately asked Gemini AI to summarize (so, scan) its documents...

Related to this news: https://news.ycombinator.com/item?id=40934670

By @atum47 - 10 months

Every single week I have to refuse enabling back up for my pictures on my Google pixel. I refuse it today, next week I open the app and the UI shows the back up option enabled with a button "continue using the app with back up".

Somebody took the time to talk down my comment about this being a strategy to give their AI more training data. I continue believing that if they have your data they will use it.

By @PessimalDecimal - 10 months

Meta commentary but still relevant I think:

The author first refers to his source as Kevin Bankston in the article's subtitle. This is also the name shown in the embedded tweet. But the following two references call him Kevin _Bankster_ (which seems like an amusing portmanteau of banker and gangster I guess).

Is the author not proofreading his own copy? Are there no editors? If the author can't even keep the name of his source straight and represent that consistently in the article, is there reason to think other details are being relayed correctly?

By @_spduchamp - 10 months

I now feel obligated to cram as much AI-f'n-up crap into my Drive as possible. Come'n get it!

By @VeejayRampay - 10 months

this is not openai doing shady things so everyone should be up in arms