Google Is the Only Search Engine That Works on Reddit Now Thanks to AI Deal
Google secures exclusive search rights on Reddit through a lucrative deal, hindering other search engines' access. Reddit tightens restrictions to safeguard content and address challenges posed by dominant search engines.
Read original articleGoogle has become the exclusive search engine for Reddit due to a multi-million dollar deal that allows Google to scrape Reddit for data to train its AI products. Other search engines like Bing, DuckDuckGo, and Mojeek are no longer able to provide full Reddit results, limiting users' access to recent content. Reddit has updated its robots.txt file to block certain crawlers, including those used by AI companies, to protect its content from misuse. The deal between Google and Reddit highlights the challenges smaller search engines face in competing with Google's dominance in search. The move also reflects the unintended consequences of widespread web scraping for AI training, impacting the availability of alternative search options. Reddit's stricter policies aim to prevent unauthorized use of its content for commercial purposes, emphasizing the importance of respecting terms and policies when accessing Reddit data. The situation underscores the evolving landscape of online search and the implications of exclusive data access agreements on internet accessibility and competition.
Related
Google rejected me and now I'm building a search engine
The article recounts a rejection from Google during an interview, prompting the individual to create a non-profit, community-driven search engine emphasizing ethical values over profit, welcoming contributions for development.
Google Search Ranks AI Spam Above Original Reporting in News Results
Google Search faces challenges as AI-generated spam surpasses original reporting in news results. Despite efforts to combat this issue, plagiarized articles with AI-generated illustrations dominate search rankings, raising concerns among SEO experts and original content creators.
Reddit has updated its robots.txt to block all web crawlers
Reddit updated its robots.txt file to block web crawlers, aiming to protect user privacy and prevent content misuse. This change impacts data access for entities like Google, potentially hindering legitimate research. CEO Steve Huffman emphasizes balancing data use costs. The effects on search engines and partnerships are uncertain.
Google Now Defaults to Not Indexing Your Content
Google has changed its indexing to prioritize unique, authoritative, and recognizable content. This selective approach may exclude smaller players, making visibility harder. Content creators face challenges adapting to Google's exclusive indexing, affecting search results.
'Google says I'm a dead physicist': is the biggest search engine broken?
Google faces scrutiny over search result accuracy and reliability, with concerns about incorrect information and cluttered interface. Despite dominance in the search market, criticisms persist regarding data privacy and search quality.
# Welcome to Reddit's robots.txt
# Reddit believes in an open internet, but not the misuse of public content.
# See https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy Reddit's Public Content Policy for access and use restrictions to Reddit content.
# See https://www.reddit.com/r/reddit4researchers/ for details on how Reddit continues to support research and non-commercial use.
# policy: https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy
User-agent: *
Disallow: /
Source: https://www.reddit.com/robots.txtThis is a dangerous precedent for the internet. Business conglomerates have been controlling most of the web, but refusing basic interoperability is even worse.
How many other sites might have leverage to charge to be indexed?
I don't want to live in a world where you have to use X search engine to get answers from Y site - but this seems like the beginning of that world.
From an efficiency perspective - it's obviously better for websites to just lease their data to search engines then both sides paying tons of bandwidth and compute to get that data onto search engines.
Realistically, there are only 2 search engines now.
This seems very bad for Kagi - but possibly could lead the old, cool, hobbiest & un-monetized web being reinvented?
But I can understand why they made the change they did. The data was being abused.
My guess is that this was an oversight -- that they will do an audit and reopen it for search engines after those engines agree not to use the data for training, because let's face it, reddit is a for profit business and they have to protect their income streams.
I'm not sure what to make of that.
I wonder if this might affect redis, as in slowly kill it's user base especially when it comes to user providing (and often also looking for) high quality content, because who of such users would want to use google search?
The veracity of this statement is questionable.
I found at least four web search engines not using Google's index that produced results from the last week.
Example: Recent eruption at Yellowstone Black Diamond Pool
https://www.ecosia.org/search?method=index&q=site:reddit.com...
https://search.brave.com/search?q=reddit.com+black+diamond+p...
https://api.yep.com/fs/2/search?client=web&gl=all&no_correct...
POST /sp/search HTTP/1.0
host: www.startpage.com
content-length: 74
content-type: application/x-www-form-urlencoded
query=site:reddit.com black diamond pool&abp=-1&t=&lui=english&sc=&cat=web
At least for this example, I got the same desired result using Reddit site search.https://old.reddit.com/search/?q=black+diamond+pool
If anyone has some good examples of search queries that I can test showing why a search engine must be used, please share.
It’s not like everyone wasn’t already pulling the same grift, but quantity really does have a quality all its own.
Google really should blacklist reddit entirely for this practice, but sadly as bad as reddit is it's still a much higher quality result than average for google.
[0]https://www.reddit.com/r/ChatGPT/comments/133xgb5/gpt2_was_p...
As soon as someone shows me a search engine that restores quality of searxh, im getting a subscription for work.
It really cany be hard to whitelist sources and index appropiately.
Get goimg nerds , google has fallen.
I remember seeing an unhelpful hyperlink for the first time. It was a random word in the body of a random tech site that redirected to a list of articles from that site tagged with that term.
I remember being stunned, my expectation was that the link would lead me to another website, one that would be an authoritative source on that term and freely accessible.
20 years later we get a paywalled article about fragmented web – and we’re not slowing down.
https://cc.bingj.com/cache.aspx?d=5070227914243&w=ljIRk8yx42...
For example, when I search for product reviews, I always specify reddit. Otherwise the search results are inundated with SEO spam.
Reddit's justification for this is profoundly wrong. Their "public content policy" is absurd doublespeak, and counter to everything the open internet is and hopes to be. You cannot simultaneously call yourself "open" and "public" while refusing access to automated clients. Every client is automated. They even go so far as to say that "crawling" (also known as "downloading") is an "abuse" and violates user privacy.
This is absurd, and not justified. I would love to see legislation that restricted server operators' ability to prohibit automated access in this way, but I suppose it will never happen. Some people in this thread have attempted to justify the policy by saying "they have to protect their income streams". No they don't. You don't have a right to an income stream, and you certainly don't have a right to lie in order to get all the benefits of an open internet with none of the downsides. Noting of course that the "downsides" are in this case actually just "competitors".
Also things like the API fiasco, and also small annoyances like the fact that when you click on an image on reddit, it now goes to a wrapper html page instead of just the actual image (this was one of the reasons reddit was better than most social media...).
When Apple strikes an exclusive deal with suppliers for parts, it is sound business practice.
When Google strikes an exclusive deal with Reddit, it is ..
Some of you have no idea how businesses work, and it shows.
Related
Google rejected me and now I'm building a search engine
The article recounts a rejection from Google during an interview, prompting the individual to create a non-profit, community-driven search engine emphasizing ethical values over profit, welcoming contributions for development.
Google Search Ranks AI Spam Above Original Reporting in News Results
Google Search faces challenges as AI-generated spam surpasses original reporting in news results. Despite efforts to combat this issue, plagiarized articles with AI-generated illustrations dominate search rankings, raising concerns among SEO experts and original content creators.
Reddit has updated its robots.txt to block all web crawlers
Reddit updated its robots.txt file to block web crawlers, aiming to protect user privacy and prevent content misuse. This change impacts data access for entities like Google, potentially hindering legitimate research. CEO Steve Huffman emphasizes balancing data use costs. The effects on search engines and partnerships are uncertain.
Google Now Defaults to Not Indexing Your Content
Google has changed its indexing to prioritize unique, authoritative, and recognizable content. This selective approach may exclude smaller players, making visibility harder. Content creators face challenges adapting to Google's exclusive indexing, affecting search results.
'Google says I'm a dead physicist': is the biggest search engine broken?
Google faces scrutiny over search result accuracy and reliability, with concerns about incorrect information and cluttered interface. Despite dominance in the search market, criticisms persist regarding data privacy and search quality.