June 28th, 2024

Perplexity's Grand Theft AI

Perplexity, a search engine rivaling Google, faces criticism for bypassing original sources, dodging paywalls, and promoting unethical behavior. The CEO's defense raises concerns about trust and integrity online.

Read original article

Perplexity, a potential competitor to Google Search, aims to be an "answer engine" by providing direct answers instead of directing users to primary sources. However, it has been criticized for being a middleman that deprives original sources of ad revenue by summarizing and plagiarizing content. The company has faced backlash for dodging paywalls, ignoring robots.txt directives, and scraping content unethically. Perplexity's CEO defended these actions, highlighting a lack of commitment to ethical practices. The company's approach raises concerns about trust and integrity on the internet, as it relies on deception and unethical behavior to gather data. Despite claims of prioritizing accuracy, Perplexity has been found surfacing AI-generated results and misinformation. The controversy surrounding Perplexity underscores the ethical challenges posed by AI companies that prioritize profit over principles, ultimately eroding trust in online information sources.

We need an evolved robots.txt and regulations to enforce it

In the era of AI, the robots.txt file faces limitations in guiding web crawlers. Proposals advocate for enhanced standards to regulate content indexing, caching, and language model training. Stricter enforcement, including penalties for violators like Perplexity AI, is urged to protect content creators and uphold ethical AI practices.

Before Smartphones, an Army of Real People Helped You Find Stuff on Google

Before smartphones, human-powered services like GOOG-411, 118 118, and AQA provided information. They declined with cheaper data plans, giving way to automated search engines, missing the personal touch and unique responses.

The Encyclopedia Project, or How to Know in the Age of AI

Artificial intelligence challenges information reliability online, blurring real and fake content. An anecdote underscores the necessity of trustworthy sources like encyclopedias. The piece advocates for critical thinking amid AI-driven misinformation.

Large Language Models are not a search engine

Large Language Models (LLMs) from Google and Meta generate algorithmic content, causing nonsensical "hallucinations." Companies struggle to manage errors post-generation due to factors like training data and temperature settings. LLMs aim to improve user interactions but raise skepticism about delivering factual information.

Perplexity's Grand Theft AI

Perplexity, a search engine rivaling Google, faces criticism for being a middleman that undermines original sources' revenue by summarizing content unethically. The CEO's deceptive practices raise concerns about trust and integrity.

12 comments

By @Ukv - 10 months

> At this point, Wired jumped in, confirming a finding from Robb Knight: Perplexity’s scraping of Forbes’ work wasn’t an exception. In fact, Perplexity has been ignoring the robots.txt code that explicitly asks web crawlers not to scrape the page

There's a distinction that is being missed between:

* Web crawlers automatically accessing pages, such as recursively following links to index them for search engines

* A tool accessing a URL in direct response to a user request

robots.txt is only intended for the former. For instance, archive.is state:

> [Why does archive.is not obey robots.txt?] Because it is not a free-walking crawler, it saves only one page acting as a direct agent of the human user. Such services don't obey robots.txt (e.g. Google Feedfetcher, screenshot- or pdf-making services, isup.me, …)

Perplexity do (as far as I've been able to find) respect robots.txt for their scraping. What the investigations confirmed it was ignored for was users entering a URL to summarise.

There is still a conversation to be had. I think users should be able to browse the web with whatever tool they want to - whether that's a standard browser, a minimal reader mode that doesn't show ads, or a statistical summarisation tool - but it does pose a problem for sites that rely on standard browsers letting them show users advertisements, get them to sign up to mailing lists, collect data, etc. if users decide to choose tools that don't allow that.

By @orasis - 10 months

Isn’t it ironic that journalists themselves aggregate content in a way that discourages the reader from clicking through to the primary source?

This type of summarization and aggregation seems to be exactly what consumers want.

By @danlindley - 10 months

Perplexity AI is lying about their user agent https://news.ycombinator.com/item?id=40690898

By @caboteria - 10 months

> So that’s Perplexity’s real innovation here: shattering the foundations of trust that built the internet. The question is if any of its users or investors care.

Ron Howard voiceover: They do not.

By @sva_ - 10 months

I think what Perplexity is trying to build is cool, but seems like they do quite a bit of dodgy shit.

On his recent appearance on the Lex Fridman episode, Perplexity CEO Srinivas admitted that they'd abuse Twitter's academic grant program and autogenerated thousands of grant applications with GPT.

If he so laughingly reminisces about doing that, I wonder what kind of other dodgy shit they do behind closed doors.

By @fxj - 10 months

So when I pay an intern to do a google search and make a short report of the facts that he was finding and giving it back to me, this is not a problem, but when I ask an AI to do the same it suddenly is problem. Well I dont get it.

just my 2 ct

By @GaggiX - 10 months

I believe that sooner or later we will see something similar to Perplexity that runs directly in your browser, opens the pages for you, and answers your question.

Unless it has already been done.

By @Havoc - 10 months

Bit confused about why perplexity in particular?

“Leveraging” other people’s content is basically par for the course. Whether it’s training or google news or Google books or stability A.I. images it’s all doing the same just to different degrees

By @ChrisArchitect - 10 months

Perplexity is a bullshit machine

https://news.ycombinator.com/item?id=40728732

By @tomp - 10 months

Lying article lies.

> Though Forbes has a metered paywall on some of its work, the premium work — like that investigation — is behind a hard paywall

No it's not, see for yourself! I can read it just fine...

https://www.forbes.com/sites/sarahemerson/2024/06/06/eric-sc...

By @visarga - 10 months

Copyright should die. It was already standing on just one foot since internet made mass copying trivial. File sharing has been going on for decades. Now they want to extend copyright to restrict AI which doesn't even replicate the source text, and instead aims to answer a user question by combining information across multiple sources.

We have seen other models that don't rely on protection flourish - Wikipedia, open source, scientific publications, open weights models, and even fashion. They all are permissive, and thriving.

Perplexity's Grand Theft AI

Related

We need an evolved robots.txt and regulations to enforce it

Before Smartphones, an Army of Real People Helped You Find Stuff on Google

The Encyclopedia Project, or How to Know in the Age of AI

Large Language Models are not a search engine

Perplexity's Grand Theft AI

Related

We need an evolved robots.txt and regulations to enforce it

Before Smartphones, an Army of Real People Helped You Find Stuff on Google

The Encyclopedia Project, or How to Know in the Age of AI

Large Language Models are not a search engine

Perplexity's Grand Theft AI