June 27th, 2024

Large Language Models are not a search engine

Large Language Models (LLMs) from Google and Meta generate algorithmic content, causing nonsensical "hallucinations." Companies struggle to manage errors post-generation due to factors like training data and temperature settings. LLMs aim to improve user interactions but raise skepticism about delivering factual information.

Read original article

Large Language Models are not a search engine

Large Language Models (LLMs) like those used by Google and Meta are transforming search functions into platforms generating algorithmically generated content, sometimes leading to nonsensical outcomes. These "hallucinations" stem from the challenge of predicting probability distributions within vast text collections. LLMs are not designed to produce truth but rather statistically likely outcomes. Companies are now grappling with how to control these errors post-generation. Factors like temperature settings and training data influence the text generated by LLMs, leading to unpredictable results. Social media companies leverage human feedback to refine these models, aiming to improve user interactions. Despite their creative potential, LLMs may not always deliver factual information, prompting skepticism about their role as search engines. Google's CEO acknowledges the inherent challenges of LLMs, emphasizing the importance of grounding them with contextual information for a better user experience. The complexity of LLMs highlights the intricate balance between variety and accuracy in information retrieval systems. Ultimately, the debate continues on the suitability of LLMs for search engine functions in light of their unpredictable nature and potential for generating misleading content.

Researchers describe how to tell if ChatGPT is confabulating

Researchers at the University of Oxford devised a method to detect confabulation in large language models like ChatGPT. By assessing semantic equivalence, they aim to reduce false answers and enhance model accuracy.

Delving into ChatGPT usage in academic writing through excess vocabulary

A study by Dmitry Kobak et al. examines ChatGPT's impact on academic writing, finding increased usage in PubMed abstracts. Concerns arise over accuracy and bias despite advanced text generation capabilities.

Detecting hallucinations in large language models using semantic entropy

Researchers devised a method to detect hallucinations in large language models like ChatGPT and Gemini by measuring semantic entropy. This approach enhances accuracy by filtering unreliable answers, improving model performance significantly.

LLMs on the Command Line

Simon Willison presented a Python command-line utility for accessing Large Language Models (LLMs) efficiently, supporting OpenAI models and plugins for various providers. The tool enables running prompts, managing conversations, accessing specific models like Claude 3, and logging interactions to a SQLite database. Willison highlighted using LLM for tasks like summarizing discussions and emphasized the importance of embeddings for semantic search, showcasing LLM's support for content similarity queries and extensibility through plugins and OpenAI API compatibility.

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Hackers exploit vulnerabilities in AI models from OpenAI, Google, and xAI, sharing harmful content. Ethical hackers challenge AI security, prompting the rise of LLM security start-ups amid global regulatory concerns. Collaboration is key to addressing evolving AI threats.

13 comments

By @ImaCake - 11 months

Except they kinda are? LLMs are just word models built from a corpus of the internet. There are examples of GPT3 regurgitating reddit comments in full given the right prompt.

Certainly I find LLMs replace a lot of searches for me and google/microsoft is right to eat its own breakfast to get ahead of it.

By @fdr - 11 months

I do like Perplexity.ai. But interpreting how it works, and portrays itself as, is that the LLM component of it is, in fact, not a search engine.

How I interpret it: it is a more powerful version of stemming and synonym expansion of information retrieval classics when generating the queries it feeds into traditional information systems (such as the Bing search engine via API, or other index).

After retrieval, it's a selector and summarizer of repetition seen in the results to give you something of a blended outcome, pertinent to the prompt you gave it. Like any other tool, you get a feel for when it has is having problems, and some of those problems can be assessed by at least glancing at the sources it consulted. You get all sorts of weird stuff when your sources don't include relevant results or biased results

The first problem happens when the documents you are searching for do not exist, or something about your prompt -- it's usually obvious what it is -- is not sourcing documents you know to exist.

The second, bias, I've seen when researching something like the design conceits of Infiniband. While it has its genuine virtues, almost nobody talks about it...and many of those things that discuss it are Infiniband marketing materials that are both a bit too fluffy and sometimes stretch the truth, as marketing materials are wont to do. But you can spot this in the sources panel immediately.

I never found "disembodied" LLMs very useful.

By @hprotagonist - 11 months

“…the outcomes of Large Language Models are not designed to be true — they are merely designed to be statistically likely.”

yep!

By @benreesman - 11 months

Modern LLMs are empirically a great way to compress vast amounts of text.

Being able to ask one of the better open tunes a question I would normally ask Google makes it possible to work from anywhere in a way that hasn’t been true since I was a kid.

Big, overfunded, ethically and legally dubious data-vacuum black box APIs are and should be controversial.

Stack Overflow and much else in 30-70Gb on my MacBook Pro on a beach is strictly awesome. That should not be controversial.

cc Scott Wiener and the thugs paying.

By @gtirloni - 11 months

Considering most search engines are close to useless these days, I'd say major LLMs are doing a good job nonetheless.

By @purpleblue - 11 months

They aren't, but that's what the customers want. They don't care about all this generative stuff, what they really care about is getting answers to questions. This is the expectation and whoever cracks this reliably will be the next Google.

By @jackconsidine - 11 months

Good breakdown. As others pointed out, there's an overlap. I dug through my browser history and found my weekly Google searches in 2024 is 119, down from 300 in 2022.

By @bcatanzaro - 11 months

Maybe the best benefit of LLM as a search engine is that they haven't figured out yet how to serve you 10 ads before they give you the link.

By @timonoko - 11 months

I did not understand why recently HN-referenced "which-key" does not work at emacs 25, but work at emacs 27. ChatGPT parsed the cryptic error message and suggested remedy, which was (defun string-empty-p (x) (= (length x) 0))

For us with lower level of comprehension, LLMs are literally brain extensions already.

By @fsndz - 11 months

Yes, but you just have to consider that the challenge of AI is mostly reliability: https://www.lycee.ai/blog/ai-reliability-challenge

By @cobbzilla - 11 months

technically, they’re not, but the overlap is big.

I don’t know about you, but I am using search engines a lot less, and asking LLM’s a lot more.

It doesn’t bode well for the search engine business overall.

By @sitkack - 11 months

No, but when paired with a search api, they absolutely are

https://www.perplexity.ai/

https://www.phind.com/

Large Language Models are not a search engine

Related

Researchers describe how to tell if ChatGPT is confabulating

Delving into ChatGPT usage in academic writing through excess vocabulary

Detecting hallucinations in large language models using semantic entropy

LLMs on the Command Line

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws

Related

Researchers describe how to tell if ChatGPT is confabulating

Delving into ChatGPT usage in academic writing through excess vocabulary

Detecting hallucinations in large language models using semantic entropy

LLMs on the Command Line

Hackers 'jailbreak' powerful AI models in global effort to highlight flaws