Perplexity's Grand Theft AI
Perplexity, a search engine rivaling Google, faces criticism for bypassing original sources, dodging paywalls, and promoting unethical behavior. The CEO's defense raises concerns about trust and integrity online.
Read original articlePerplexity, a potential competitor to Google Search, aims to be an "answer engine" by providing direct answers instead of directing users to primary sources. However, it has been criticized for being a middleman that deprives original sources of ad revenue by summarizing and plagiarizing content. The company has faced backlash for dodging paywalls, ignoring robots.txt directives, and scraping content unethically. Perplexity's CEO defended these actions, highlighting a lack of commitment to ethical practices. The company's approach raises concerns about trust and integrity on the internet, as it relies on deception and unethical behavior to gather data. Despite claims of prioritizing accuracy, Perplexity has been found surfacing AI-generated results and misinformation. The controversy surrounding Perplexity underscores the ethical challenges posed by AI companies that prioritize profit over principles, ultimately eroding trust in online information sources.
Related
We need an evolved robots.txt and regulations to enforce it
In the era of AI, the robots.txt file faces limitations in guiding web crawlers. Proposals advocate for enhanced standards to regulate content indexing, caching, and language model training. Stricter enforcement, including penalties for violators like Perplexity AI, is urged to protect content creators and uphold ethical AI practices.
Before Smartphones, an Army of Real People Helped You Find Stuff on Google
Before smartphones, human-powered services like GOOG-411, 118 118, and AQA provided information. They declined with cheaper data plans, giving way to automated search engines, missing the personal touch and unique responses.
The Encyclopedia Project, or How to Know in the Age of AI
Artificial intelligence challenges information reliability online, blurring real and fake content. An anecdote underscores the necessity of trustworthy sources like encyclopedias. The piece advocates for critical thinking amid AI-driven misinformation.
Large Language Models are not a search engine
Large Language Models (LLMs) from Google and Meta generate algorithmic content, causing nonsensical "hallucinations." Companies struggle to manage errors post-generation due to factors like training data and temperature settings. LLMs aim to improve user interactions but raise skepticism about delivering factual information.
Perplexity's Grand Theft AI
Perplexity, a search engine rivaling Google, faces criticism for being a middleman that undermines original sources' revenue by summarizing content unethically. The CEO's deceptive practices raise concerns about trust and integrity.
There's a distinction that is being missed between:
* Web crawlers automatically accessing pages, such as recursively following links to index them for search engines
* A tool accessing a URL in direct response to a user request
robots.txt is only intended for the former. For instance, archive.is state:
> [Why does archive.is not obey robots.txt?] Because it is not a free-walking crawler, it saves only one page acting as a direct agent of the human user. Such services don't obey robots.txt (e.g. Google Feedfetcher, screenshot- or pdf-making services, isup.me, …)
Perplexity do (as far as I've been able to find) respect robots.txt for their scraping. What the investigations confirmed it was ignored for was users entering a URL to summarise.
There is still a conversation to be had. I think users should be able to browse the web with whatever tool they want to - whether that's a standard browser, a minimal reader mode that doesn't show ads, or a statistical summarisation tool - but it does pose a problem for sites that rely on standard browsers letting them show users advertisements, get them to sign up to mailing lists, collect data, etc. if users decide to choose tools that don't allow that.
This type of summarization and aggregation seems to be exactly what consumers want.
Perplexity AI is lying about their user agent https://news.ycombinator.com/item?id=40690898
Ron Howard voiceover: They do not.
On his recent appearance on the Lex Fridman episode, Perplexity CEO Srinivas admitted that they'd abuse Twitter's academic grant program and autogenerated thousands of grant applications with GPT.
If he so laughingly reminisces about doing that, I wonder what kind of other dodgy shit they do behind closed doors.
just my 2 ct
Unless it has already been done.
“Leveraging” other people’s content is basically par for the course. Whether it’s training or google news or Google books or stability A.I. images it’s all doing the same just to different degrees
Perplexity is a bullshit machine
> Though Forbes has a metered paywall on some of its work, the premium work — like that investigation — is behind a hard paywall
No it's not, see for yourself! I can read it just fine...
https://www.forbes.com/sites/sarahemerson/2024/06/06/eric-sc...
We have seen other models that don't rely on protection flourish - Wikipedia, open source, scientific publications, open weights models, and even fashion. They all are permissive, and thriving.
Related
We need an evolved robots.txt and regulations to enforce it
In the era of AI, the robots.txt file faces limitations in guiding web crawlers. Proposals advocate for enhanced standards to regulate content indexing, caching, and language model training. Stricter enforcement, including penalties for violators like Perplexity AI, is urged to protect content creators and uphold ethical AI practices.
Before Smartphones, an Army of Real People Helped You Find Stuff on Google
Before smartphones, human-powered services like GOOG-411, 118 118, and AQA provided information. They declined with cheaper data plans, giving way to automated search engines, missing the personal touch and unique responses.
The Encyclopedia Project, or How to Know in the Age of AI
Artificial intelligence challenges information reliability online, blurring real and fake content. An anecdote underscores the necessity of trustworthy sources like encyclopedias. The piece advocates for critical thinking amid AI-driven misinformation.
Large Language Models are not a search engine
Large Language Models (LLMs) from Google and Meta generate algorithmic content, causing nonsensical "hallucinations." Companies struggle to manage errors post-generation due to factors like training data and temperature settings. LLMs aim to improve user interactions but raise skepticism about delivering factual information.
Perplexity's Grand Theft AI
Perplexity, a search engine rivaling Google, faces criticism for being a middleman that undermines original sources' revenue by summarizing content unethically. The CEO's deceptive practices raise concerns about trust and integrity.