AI assisted search-based research works now
Recent advancements in AI-assisted search tools, particularly OpenAI's o3 and o4-mini models, have improved accuracy and reliability, potentially transforming research methods and impacting traditional web search usage.
Read original articleRecent advancements in AI-assisted search-based research have made significant progress, particularly in 2025. Initially, tools like Google Gemini and OpenAI's ChatGPT struggled with accuracy, often hallucinating details not present in search results. However, the latest iterations, including OpenAI's o3 and o4-mini models, have shown marked improvement. These models can now integrate search capabilities into their reasoning processes, yielding reliable and useful answers without the long wait times associated with earlier systems. Users have reported successful interactions with these models, receiving accurate information grounded in real-time search results. Despite the advancements, there are still concerns regarding the reliability of AI outputs, particularly with competitors like Google and Anthropic lagging behind in performance. The shift towards AI as a primary research assistant raises questions about the future of web search and the economic model of online information access, as users may increasingly rely on AI for answers rather than traditional search engines. This evolution could lead to significant changes in how information is consumed and the potential for legal challenges as the landscape shifts.
- AI-assisted search tools have improved significantly, providing reliable answers.
- OpenAI's o3 and o4-mini models integrate search into their reasoning processes.
- Previous models often hallucinated information, but recent versions have reduced this issue.
- The shift towards AI for research tasks may impact traditional web search usage.
- Competitors like Google and Anthropic need to enhance their offerings to keep pace.
Related
Is ChatGPT Search Going to Disrupt Google Search?
OpenAI has launched Search for ChatGPT, an AI-driven search engine that offers direct answers, utilizes real-time data, and allows conversational engagement, aiming to challenge traditional search engines like Google.
ChatGPT Search is not OpenAI's 'Google killer' yet
OpenAI's ChatGPT Search is not yet a viable alternative to Google, struggling with short queries and inaccuracies, while performing better with detailed questions. Google remains the preferred search engine.
AI #97: 4
In 2025, AI developments include new models like OpenAI's o3, concerns over AI characters on social media, children's preference for traditional toys, and the need for innovative recruitment strategies in tech.
AI means the end of internet search as we've known it
AI is transforming internet search from keyword-based queries to conversational interactions, with Google’s AI Overviews providing detailed answers, raising concerns for publishers about traffic loss and information accuracy.
AI search engine study finds wrong cites in 60%+ of queries; Grok3 had 94% wrong
Nearly 25% of Americans use AI search engines, but over 60% of their responses are incorrect, often misattributing sources and ignoring content access preferences, raising concerns about misinformation and publisher credibility.
- Users report varied effectiveness, with some finding the tools helpful for specific inquiries, while others criticize them for inaccuracies and lack of depth.
- Concerns about trust and verification are prevalent, with many users feeling that AI outputs can be misleading or unverifiable.
- Some users appreciate the potential of AI for deep research but highlight limitations in dynamic querying and adaptability.
- There is a call for better integration of AI with traditional search methods, emphasizing the need for reliable and trustworthy information.
- Discussions also touch on the economic implications of AI on traditional web search models and the future of information retrieval.
I as a human know how to find this information. The game day rosters for many NFL teams are available on many sites. It would be tedious but possible for me to find this number. It might take an hour of my time.
But despite this being a relatively easy research task all of the deep research tools I tried (OpenAI, Google, and Perplexity) completely failed and just gave me a general estimate.
Based on this article I tried that search just using o3 without deep research and it still failed miserably.
Gemini 2.5 Pro and o3/o4-mini seem to have crossed a threshold for a bunch of things (at least for me) in the last few weeks.
Tasteful, effective use of the search tool for o3/o4-mini is one of those. Being able to "reason" effectively over long context inputs (particularly useful for understanding and debugging larger volumes of code) is another.
It really is a game changer when the search engine
I find that an AI performing multiple searches on variations of keywords, and aggregating the top results across keywords is more extensive than most people, myself included, would do.
I had luck once asking what its search queries were. It usually provides the references.
1. Technically it might be possible to search the Internet, but it might not surface correct and/or useful information.
2. High-value information that would make a research report valuable is rarely public nor free. This holds especially true in capital-intensive or regulated industries.
I believe that most positions are resolved if
1) you accept that these are fundamentally narrative tools. They build stories, In whatever style you wish. Stories of code, stories of project reports. Stories or conversations.
2) this is balanced by the idea that the core of everything in our shared information economy is Verification.
The reason experts get use out of these tools, is because they can verify when the output is close enough to be indistinguishable from expert effort.
Domain experts also do another level of verification (hopefully) which is to check if the generated content computes correctly as a result - based on their mental model of their domain.
I would predict that that LLMs are deadly in the hands of people who can’t gauge the output, and will end up driving themselves off of a cliff, while experts will be able to use it effectively on tasks where verification of the output has a comparative effort advantage, over the task of creating the output.
The first result was WB, which I gave to it as the first example and am already using. Results 2 and 3 were the mainstream services which it helpfully marked in the table as not having the features I need. Result 4 looked promising but was discontinued 3 years ago. Result 5 was an actual option which I'm trying out (but may not work for other reasons).
So, 1/5 usable results. That was mildly helpful I guess, but it appeared a lot more helpful on the surface than it was. And I don't seem to have the ability to say "nice try but dig deeper".
My question is, how to reproduce this level of functionality locally, in a "home lab" type setting. I fully expect the various AI companies to follow the exact same business model as any other VC-funded tech outfit: free service (you're the product) -> paid service (you're still the product) -> paid service with advertising baked in (now you're unabashedly the product).
I fear that with LLM-based offerings, the advertising will be increasingly inseparable, and eventually undetectable, from the actual useful information we seek. I'd like to get a "clean" capsule of the world's compendium of knowledge with this amazing ability to self-reason, before it's truly corrupted.
First one, geolocation a photo I saw in a museum. It didn’t find a definitive answer but it sure turned up a lot of fascinating info in its research.
Second one, I asked it to suggest a new line of enquiry in the Madeleine McCann missing person case. It made the interesting suggestion that the 30 minute phone call the suspect made on the evening of the disappearance, from a place near the location of the abduction, was actually a sort of “lookout call” to an accomplice nearby.
Quite impressed. This is a great investigative tool.
> “Google is still showing slop for Encanto 2!” (Link is provided)
I believe quite strongly that Google is making a serious misstep in this area, the “supposed answer text pinned at the top above the actual search results.”
For years they showed something in this area which was directly quoted from what I assume was a shortlist of non-BS sites so users were conditioned for years that if they just wanted a simple answer like when a certain movie came out or if a certain show had been canceled or something, you may as well trust it.
Now it seems like they have given over that previous real estate to a far less reliable feature, which simply feeds any old garbage it finds anywhere into a credulous LLM and takes whatever pops out. 90% of people that I witness using Google today simply read that text and never click any results.
As a result, Google is now pretty much always even less accurate at the job of answering questions than if you posed that same question to ChatGPT, because GPT seems to be drawing from its overall weights which tend toward basic reality, whereas Google’s “Answer” seems to be summarizing a random 1-5 articles from the Spam Web, with zero discrimination between fact, satire, fiction, and propaganda. How can they keep doing this and not expect it to go badly?
I secondarily wonder how an LLM solves the trust problem in Web search. What's traditionally solved (and now gamed) through PageRank. It doesn't seem ChatGPT is easily fooled by Spam as direct search.
How much is Bing (or whatever the search engine is) getting better? vs how much are LLMs better at knowing what a good result is for a query?
Or perhaps it has to do with the richer questions that get asked to chat vs search?
Conveniently Gemini is the best frontier model for everything else, they’re very interested and well positioned (if not best?) to also be the best in deep research. Let’s check back in 3-6 months.
Individual model vendors cannot do such a product as they are biased towards their own model, they would not allow you to choose models from competitors.
> The user-facing Google Gemini app can search too, but it doesn’t show me what it’s searching for.
Gemini 2.5 Pro is also capable of search as part of its chain of thought but it needs light prodding to show URLs, but it'll do so and is good at it.Unrelated point, but I'm going to keep saying this anywhere Google engineers may be reading, the main problem with Gemini is their horrendous web app riddled with 5 annoying bugs that I identified as a casual user after a week. I assume it's in such a bad state because they don't actually use the app and they use the API, but come on. You solved the hard problem of making the world's best overall model but are squandering it on the world's worst user interface.
I need to get from A to B via C via public transport in a big metropolis.
Now C could be one of say 5 different locations of a bank branch, electronics retailer, blood test lab or whatever, so there's multiple ways of going about this.
I would like a chatbot solution that compares all the different options and lays them out ranked by time from A to B. Is this doable today?
dont forget Xai grok!
For example, a lot of the "sources" cited in Google's AI Overview (notably not a deep research product) are not official, just sites that probably rank high in SEO. I want the original source, or a reliable source, not joeswebsite dot com (no offense to this website if it indeed exists).
Biologists, mathematicians, physicists, philosophers and the like seem to have an open-ended benefit from the research which AI is now starting to enable. I kind of envy them.
Unless one moves into AI research?
- shooting buildings in Gaza https://apnews.com/article/israel-palestinians-ai-weapons-43...
- compiling a list of information on Government workers in US https://www.msn.com/en-us/news/politics/elon-musk-s-doge-usi...
- creating a few losy music videos
I'd argue we'd be better off SLOWING DOWN with that shit
Related
Is ChatGPT Search Going to Disrupt Google Search?
OpenAI has launched Search for ChatGPT, an AI-driven search engine that offers direct answers, utilizes real-time data, and allows conversational engagement, aiming to challenge traditional search engines like Google.
ChatGPT Search is not OpenAI's 'Google killer' yet
OpenAI's ChatGPT Search is not yet a viable alternative to Google, struggling with short queries and inaccuracies, while performing better with detailed questions. Google remains the preferred search engine.
AI #97: 4
In 2025, AI developments include new models like OpenAI's o3, concerns over AI characters on social media, children's preference for traditional toys, and the need for innovative recruitment strategies in tech.
AI means the end of internet search as we've known it
AI is transforming internet search from keyword-based queries to conversational interactions, with Google’s AI Overviews providing detailed answers, raising concerns for publishers about traffic loss and information accuracy.
AI search engine study finds wrong cites in 60%+ of queries; Grok3 had 94% wrong
Nearly 25% of Americans use AI search engines, but over 60% of their responses are incorrect, often misattributing sources and ignoring content access preferences, raising concerns about misinformation and publisher credibility.