April 21st, 2025

AI assisted search-based research works now

Recent advancements in AI-assisted search tools, particularly OpenAI's o3 and o4-mini models, have improved accuracy and reliability, potentially transforming research methods and impacting traditional web search usage.

Read original articleLink Icon
CuriositySkepticismFrustration
AI assisted search-based research works now

Recent advancements in AI-assisted search-based research have made significant progress, particularly in 2025. Initially, tools like Google Gemini and OpenAI's ChatGPT struggled with accuracy, often hallucinating details not present in search results. However, the latest iterations, including OpenAI's o3 and o4-mini models, have shown marked improvement. These models can now integrate search capabilities into their reasoning processes, yielding reliable and useful answers without the long wait times associated with earlier systems. Users have reported successful interactions with these models, receiving accurate information grounded in real-time search results. Despite the advancements, there are still concerns regarding the reliability of AI outputs, particularly with competitors like Google and Anthropic lagging behind in performance. The shift towards AI as a primary research assistant raises questions about the future of web search and the economic model of online information access, as users may increasingly rely on AI for answers rather than traditional search engines. This evolution could lead to significant changes in how information is consumed and the potential for legal challenges as the landscape shifts.

- AI-assisted search tools have improved significantly, providing reliable answers.

- OpenAI's o3 and o4-mini models integrate search into their reasoning processes.

- Previous models often hallucinated information, but recent versions have reduced this issue.

- The shift towards AI for research tasks may impact traditional web search usage.

- Competitors like Google and Anthropic need to enhance their offerings to keep pace.

AI: What people are saying
The comments reflect a mix of experiences and opinions regarding AI-assisted search tools like OpenAI's o3 and o4-mini models.
  • Users report varied effectiveness, with some finding the tools helpful for specific inquiries, while others criticize them for inaccuracies and lack of depth.
  • Concerns about trust and verification are prevalent, with many users feeling that AI outputs can be misleading or unverifiable.
  • Some users appreciate the potential of AI for deep research but highlight limitations in dynamic querying and adaptability.
  • There is a call for better integration of AI with traditional search methods, emphasizing the need for reliable and trustworthy information.
  • Discussions also touch on the economic implications of AI on traditional web search models and the future of information retrieval.
Link Icon 31 comments
By @CSMastermind - 18 days
The various deep research products don't work well for me. For example I asked these tools yesterday, "How many unique NFL players were on the roster for at least one regular season game during the 2024 season? I'd like the specific number not a general estimate."

I as a human know how to find this information. The game day rosters for many NFL teams are available on many sites. It would be tedious but possible for me to find this number. It might take an hour of my time.

But despite this being a relatively easy research task all of the deep research tools I tried (OpenAI, Google, and Perplexity) completely failed and just gave me a general estimate.

Based on this article I tried that search just using o3 without deep research and it still failed miserably.

By @simonw - 18 days
I think it's important to keep tabs on things that LLM systems fail at (or don't do well enough on) and try to notice when their performance rises above that bar.

Gemini 2.5 Pro and o3/o4-mini seem to have crossed a threshold for a bunch of things (at least for me) in the last few weeks.

Tasteful, effective use of the search tool for o3/o4-mini is one of those. Being able to "reason" effectively over long context inputs (particularly useful for understanding and debugging larger volumes of code) is another.

By @otistravel - 17 days
The most impressive demos of these tools always involve technical tasks where the user already knows enough to verify accuracy. But for the average person asking about health issues, legal questions, or historical facts? It's basically fancy snake oil - confident-sounding BS that people can't verify. The real breakthrough would be systems that are actually trustworthy without human verification, not slightly better BS generators. True AI research breakthroughs would admit uncertainty and provide citations for everything, not fake certainty like these tools do.
By @sshine - 18 days
The article doesn’t mention Kagi: The Assistant, a search-powered LLM frontend that came out of closed beta around the beginning of the year, and got included in all paid plans since yesterday.

It really is a game changer when the search engine

I find that an AI performing multiple searches on variations of keywords, and aggregating the top results across keywords is more extensive than most people, myself included, would do.

I had luck once asking what its search queries were. It usually provides the references.

By @jsemrau - 18 days
My main observation here is

1. Technically it might be possible to search the Internet, but it might not surface correct and/or useful information.

2. High-value information that would make a research report valuable is rarely public nor free. This holds especially true in capital-intensive or regulated industries.

By @intended - 18 days
I find that these conversations on HN end up covering similar positions constantly.

I believe that most positions are resolved if

1) you accept that these are fundamentally narrative tools. They build stories, In whatever style you wish. Stories of code, stories of project reports. Stories or conversations.

2) this is balanced by the idea that the core of everything in our shared information economy is Verification.

The reason experts get use out of these tools, is because they can verify when the output is close enough to be indistinguishable from expert effort.

Domain experts also do another level of verification (hopefully) which is to check if the generated content computes correctly as a result - based on their mental model of their domain.

I would predict that that LLMs are deadly in the hands of people who can’t gauge the output, and will end up driving themselves off of a cliff, while experts will be able to use it effectively on tasks where verification of the output has a comparative effort advantage, over the task of creating the output.

By @saulpw - 18 days
I tried it recently. I asked for videochat services like the one I use (WB) with 2 specific features that the most commonly used services don't have. It asked some clarifying questions and seemed to understand the mission, then went off for 10 minutes after which it returned 5 results in a table.

The first result was WB, which I gave to it as the first example and am already using. Results 2 and 3 were the mainstream services which it helpfully marked in the table as not having the features I need. Result 4 looked promising but was discontinued 3 years ago. Result 5 was an actual option which I'm trying out (but may not work for other reasons).

So, 1/5 usable results. That was mildly helpful I guess, but it appeared a lot more helpful on the surface than it was. And I don't seem to have the ability to say "nice try but dig deeper".

By @jeffbee - 18 days
The Deep Research stuff is crazy good. It solves the issue that I can often no longer find articles that I know are out there. Example: yesterday I was holding forth on the socials about how 25 years ago my local government did such and such thing to screw up an apartment development at the site of an old movie theater, but I couldn't think of the names of any of the principals. After Googling for a bit I used a Deep Research bot to chase it down for me, and while it was doing that I made a sandwich. When I came back it had compiled a bunch of contemporaneous news articles from really obscure bloggers, plus allusions to public records it couldn't access but was confident existed, that I later found using the URLs and suggested search texts.
By @btbuildem - 18 days
It's a relevant question about the economic model for the web. On one hand, the replacement of search with a LLM-based approach threatens the existing, advertising-based model. On the other hand, the advertising model has produced so much harm: literally irreparable damage to attention spans, outrage-driven "engagement", and the general enshittification of the internet to mention just a few. I find it a bit hard to imagine whatever succeeds it will be worse for us collectively.

My question is, how to reproduce this level of functionality locally, in a "home lab" type setting. I fully expect the various AI companies to follow the exact same business model as any other VC-funded tech outfit: free service (you're the product) -> paid service (you're still the product) -> paid service with advertising baked in (now you're unabashedly the product).

I fear that with LLM-based offerings, the advertising will be increasingly inseparable, and eventually undetectable, from the actual useful information we seek. I'd like to get a "clean" capsule of the world's compendium of knowledge with this amazing ability to self-reason, before it's truly corrupted.

By @Tycho - 17 days
I tried o3 for a couple of things.

First one, geolocation a photo I saw in a museum. It didn’t find a definitive answer but it sure turned up a lot of fascinating info in its research.

Second one, I asked it to suggest a new line of enquiry in the Madeleine McCann missing person case. It made the interesting suggestion that the 30 minute phone call the suspect made on the evening of the disappearance, from a place near the location of the abduction, was actually a sort of “lookout call” to an accomplice nearby.

Quite impressed. This is a great investigative tool.

By @xp84 - 18 days
From article:

> “Google is still showing slop for Encanto 2!” (Link is provided)

I believe quite strongly that Google is making a serious misstep in this area, the “supposed answer text pinned at the top above the actual search results.”

For years they showed something in this area which was directly quoted from what I assume was a shortlist of non-BS sites so users were conditioned for years that if they just wanted a simple answer like when a certain movie came out or if a certain show had been canceled or something, you may as well trust it.

Now it seems like they have given over that previous real estate to a far less reliable feature, which simply feeds any old garbage it finds anywhere into a credulous LLM and takes whatever pops out. 90% of people that I witness using Google today simply read that text and never click any results.

As a result, Google is now pretty much always even less accurate at the job of answering questions than if you posed that same question to ChatGPT, because GPT seems to be drawing from its overall weights which tend toward basic reality, whereas Google’s “Answer” seems to be summarizing a random 1-5 articles from the Spam Web, with zero discrimination between fact, satire, fiction, and propaganda. How can they keep doing this and not expect it to go badly?

By @softwaredoug - 18 days
I wonder when Google search will let me "chat" with the search results. I often want to ask the AI Overview follow up questions.

I secondarily wonder how an LLM solves the trust problem in Web search. What's traditionally solved (and now gamed) through PageRank. It doesn't seem ChatGPT is easily fooled by Spam as direct search.

How much is Bing (or whatever the search engine is) getting better? vs how much are LLMs better at knowing what a good result is for a query?

Or perhaps it has to do with the richer questions that get asked to chat vs search?

By @csallen - 17 days
It's actually quite doable to build your own deep research agent. You just need a single prompt, a solid code loop to run it agentically, and some tools for it to call. I've been building a domain-specific deep research agent over the past few days for internal use, and I'm pretty impressed with how much better it is than any of the official deep search agents for my use case.
By @63 - 18 days
One downside I found is that the llm cannot change its initial prompt until it's done thinking. I used deep research to compare counseling centers for me but of course when it encounters some factor I hadn't thought of (e.g. the counselors here fit the criteria perfectly but none accept my insurance), it doesn't know that it ought to skip that site entirely. Really this is a critique of the deep-research approach rather than search in general, but I imagine it can still play out on smaller scales. Often, searching for information is a dynamic process involving the discovery of unknown unknowns and adjustment based on that, but ai isn't great at abstract goals or stopping to ask clarifying questions before resuming. Ultimately, the report I got wasn't useless, but it mostly just regurgitated the top 3 google results. I got much better recommendations by reaching out to a friend who works in the field.
By @blackhaz - 17 days
This is surprising. o3 produces incredible amount of hallucinations for me, and there are lots of reddit threads about it. I've had to roll back to another model because it just swamps everything in made up facts. But sometimes it is frighteningly smart. Reading its output sometimes feels like I'm missing IQ points.
By @baq - 18 days
> I can feel my usage of Google search taking a nosedive already.

Conveniently Gemini is the best frontier model for everything else, they’re very interested and well positioned (if not best?) to also be the best in deep research. Let’s check back in 3-6 months.

By @sublimefire - 18 days
I do prefer tools like GPT researcher where you are in control over sources and search engines. Sometimes you just need to use arxiv, sometimes mix research with the docs you have. Sometimes you want to use different models. I believe the future is in choosing what you need for the specific task at that moment, eg 3d model generation mixed with something else, and this all requires some sort of new “OS” level application to run from.

Individual model vendors cannot do such a product as they are biased towards their own model, they would not allow you to choose models from competitors.

By @energy123 - 18 days

  > The user-facing Google Gemini app can search too, but it doesn’t show me what it’s searching for. 
Gemini 2.5 Pro is also capable of search as part of its chain of thought but it needs light prodding to show URLs, but it'll do so and is good at it.

Unrelated point, but I'm going to keep saying this anywhere Google engineers may be reading, the main problem with Gemini is their horrendous web app riddled with 5 annoying bugs that I identified as a casual user after a week. I assume it's in such a bad state because they don't actually use the app and they use the API, but come on. You solved the hard problem of making the world's best overall model but are squandering it on the world's worst user interface.

By @mehulashah - 17 days
I find that people often conflate search with analytics when discussing Deep Research. Deep Research is iterated search and tool use. And, no doubt, it’s remarkably good. Deep Analytics is like Deep Research in that it uses generative AI models to generate a plan, but LLMs operations and structured (tool use) are interleaved in database style query pipelines. This allows for the more precise counting and exhaustive search type use cases.
By @jonas_b - 17 days
A common google searching thing I counter have is something like this:

I need to get from A to B via C via public transport in a big metropolis.

Now C could be one of say 5 different locations of a bank branch, electronics retailer, blood test lab or whatever, so there's multiple ways of going about this.

I would like a chatbot solution that compares all the different options and lays them out ranked by time from A to B. Is this doable today?

By @swyx - 18 days
> Deep Research, from three different vendors

dont forget Xai grok!

By @gitroom - 17 days
I feel like half the time these AI tools are either way too smart or just eating glue, tbh - do you think people will ever actually trust AI for deep answers or are we all just using it to pass time at work?
By @in_ab - 17 days
Claude doesn't seem to have a built in search tool but I tried this with a MCP server to search google and it gives similar results.
By @Alifatisk - 17 days
Have anyone tried ithy.com yet? If you have any prompt that you know most llms fail at, I’d love to know how itchy responds!
By @gcanyon - 17 days
The concern over "LLMs vs. the Web" is giving me serious "The Web vs. Brick and Mortar" vibes. That's not to say that it won't be the predicted cataclysm, just that it might not be. Time will tell, because This Is Happening, People, but if it does turn out to be a serious problem, I think we'll find some way to adapt. We're unlikely to accept a lesser result.
By @noja - 17 days
Can it geotag my old scanned photos?
By @Havoc - 18 days
Are any of the Deep Research tools pure api cost? Or all monthly sub?
By @BambooBandit - 17 days
I feel like the bigger problem isn't whether these deep research products work, but rather the raw material, so to speak, that they're working with.

For example, a lot of the "sources" cited in Google's AI Overview (notably not a deep research product) are not official, just sites that probably rank high in SEO. I want the original source, or a reliable source, not joeswebsite dot com (no offense to this website if it indeed exists).

By @qwertox - 18 days
I feel like the benefit which AI gives us programmers is limited. They can be extremely advanced, accelerative and helpful assistants, but we're limited to just that: architecting and developing software.

Biologists, mathematicians, physicists, philosophers and the like seem to have an open-ended benefit from the research which AI is now starting to enable. I kind of envy them.

Unless one moves into AI research?

By @oulipo - 18 days
The main "real-world" use cases for AI use for now have been:

- shooting buildings in Gaza https://apnews.com/article/israel-palestinians-ai-weapons-43...

- compiling a list of information on Government workers in US https://www.msn.com/en-us/news/politics/elon-musk-s-doge-usi...

- creating a few losy music videos

I'd argue we'd be better off SLOWING DOWN with that shit