August 24th, 2024

Why don't we have personalized search engines?

Users criticize search engines like Google for prioritizing ads over content, expressing a need for a unified tool to access diverse content types and better organize personal notes across applications.

Why don't we have personalized search engines?

The current state of search engines, particularly Google, is criticized for prioritizing advertisements over valuable content, leading to less effective search results. Users express a desire for a comprehensive tool that consolidates various types of content, including previously read articles, curated recommendations from trusted sources, purchased books, and personal notes stored across multiple applications. The need for a more efficient and user-friendly search experience is evident, as existing solutions do not adequately address these requirements.

- Users find current search engines inadequate due to ad-centric results.

- There is a demand for a unified tool to search diverse content types.

- Users want to access curated content from trusted sources easily.

- The need for better organization of personal notes across different apps is highlighted.

Link Icon 31 comments
By @browningstreet - 4 months
I’ve often wished I could publish a graph of myself, with 10-20+ items of interest, and let search engines and content recommendation engines (and good ad networks) bring me stuff I actually care about..

(especially calendar events, which used to be fun to track but everyone seems to have given up on event listings).

It wouldn’t have to track me, or infer other nefarious dimensions of my online habits, just target the things I’m asking to be targeted for.

I’m guessing that the implicit data dimensions of current tech is aggregating so much additional data about everyone that the recommendations we end up getting aren’t that great.

None of Google, Netflix, or Amazon get me at all, and I keep shoveling my habits right into their gaping data maw.

By @rty32 - 4 months
People probably won't like this, but I want to point out that Recall from Microsoft tries to do a little bit of this. Apparently, the specific implementation of that product is a spectacular privacy disaster. Which actually may not be an accident -- it is probably not simple to handle the privacy well for a personalized search engine (again, even though Microsoft made a lot of obvious mistakes), and you probably want to ensure that the data you aggregate do not end up being sold by a third-party. Still, you need to build a viable business. That's hard.
By @BeetleB - 4 months
On my TODO list is to build a system that downloads the text content of all the sites I visit and dump it in a vector DB. Then make my own search engine using RAG.

I did write a script that does the downloading part. It looks at my browser history and downloads the text of every site going back years.

Ditto for decades worth of email. I want to see if I ask it for my nephew's birthday, will it figure it out?

Should be doable without much difficulty.

By @malablaster - 4 months
I’ve been paying for Kagi search engine (a thing I never thought I’d pay for) for many months and it has a lot of what you’re asking for.
By @palata - 4 months
I guess because something needs to track everybody to get good at doing that, and when that something manages to track everybody, the way to make profit is to sell ads.

It's not that Google doesn't know technically how to give good results. It's really that Google is optimizing for profit, not for quality. In a system that makes it extremely difficult for anyone to compete (and whoever succeeded with that would presumably end up in the same situation and optimize for profit).

By @al_borland - 4 months
Neeva tried to do something like this. While it wasn’t everything you mention here, there was a feature to login to various online accounts so it would search across the web, but also your data and documents in those various accounts.

I never connected my other accounts, as I found the idea of a 3rd party having access to crawl and catalog them uncomfortable.

Neeva has since shutdown and was acquired by Snowflake.

What you’re mentioning, would likely require a company to have a very large monopoly for a very long time, where all a person’s digital media was controlled by one company. Google is close, but for book people paid for, that’s something that would fall more into Amazon’s territory. Apple also has a bookstore, so maybe it would work for people who are 100% in Apple’s ecosystem and never stray, and then only have friends with people also in Apple’s ecosystem (for the people you trust feature).

I don’t think we’ll every see enough benevolent cooperation between companies, without ulterior motives, to do something like this well without it also being a security nightmare.

By @perihelion_zero - 4 months
This is why I have one giant, enormous text file of all the technical notes I have ever taken.
By @hagbard_c - 4 months
I don't want a 'personalised' search engine because such a thing promotes tunnel vision. What I would like is a search engine which offers a 'spam filter' with a few different settings:

- no_SEO: demote anything which employs 'SEO' so it appear below search results not guilty of this sin

- no_Blogspam: demote blogspam below the original articles the bloggers refer to

- no_Sales: demote anything which tries to sell me something below results which do not. This is a tricky one to implement because not every site offering to sell something should be caught, e.g. a site explaining how to repair a flux capacitor which links to a source for these ubiquitous parts but mostly contains instructions how to install and tune the part is fine.

- no_GPT: demote anything recognised as being generated through 'AI'.

- $filter: an option to create custom filters

Depending on the reason for searching the 'net I'd have most of these options enabled but every now and then I'd switch one off, e.g. no_SEO/no_Sale when looking for something to buy.

I'm running an instance of SearxNG and hardly ever interface directly with individual search engines so I mostly avoid the 'personalisation' problem but I do not yet have access to filtering options like the ones I mentioned.

By @kkfx - 4 months
I've tried YaCy, but the amplitude of the crawling community it's way too little to make it usable so... For now there are not much options, for maps the situation is better with OSM, but still far from being usable like Google Maps for navigation or even only mere exploration since the coverage here and there it's even MUCH superior but for many others is next to void.

To been able to avoid commercial search engines we have only an option: public funding public universities who cure national infra (something already existing, but bigger) and a public indexing project with a national plan for a homeserver per connected home (much like actual ISPs 'router', only pure FLOSS handled by the user or using anyway public code) witch in the other functions also index a small part of the web in an open project like YaCy. Same thing for VoIP comms.

WE DAMN NEED institutionalized FLOSS.

By @Der_Einzige - 4 months
What's insane is that pagerank, and most other graph centrality algorithms (the heart of modern search engines) have from the beginning supported a "personalization vector" which does EXACTLY this. It's available in all major graph analysis libraries (i.e. https://networkx.org/documentation/stable/reference/algorith...)

This exists, it's here, and no one uses it for anything except serving you better ads.

By @Barrin92 - 4 months
Among power users there would be immediate cries of privacy violation, on display with Microsoft's recent debacle with the screenshooting AI thing (the name is escaping me).

And among 95% of normal users there's no demand for it because what most people do is google restaurants, cinemas, dancing videos on TikTok or they just add "reddit" to their search for anything more complicated. Most people haven't bought any reading material on the internet and don't have notes.

By @prologist11 - 4 months
I built something like this using manual screenshoting, OCR, and indexing with Meilisearch but now there are tools to do this automatically like perfectmemory.ai. You can definitely build something yourself by gluing a bunch of open source tools like I did but if you want something ready-made then it kinda already exists if you are willing to trust your operating system or 3rd party software engineers to not leak your information.
By @swayvil - 4 months
A search engine's first job is to cull the crap. To distinguish the good from the bad.

Thus a personalized search engine could double as a forum moderator.

And you could share search engines. Get a copy of the search engine of somebody that you admire/trust and merge it with your own. Thus your search engine could learn from others what's good and bad.

You could have a family search engine, passed down through the generations.

By @fallinditch - 4 months
I was just reading about how you can add a local LLM that you have on your PC to be your default AI chat assistant in Brave browser. Elsewhere on HN today is news of the latest version of LM Studio that can act as your local RAG. I reckon this sort of tech will be built into operating systems soon enough, and in this way personalized search will be enabled.
By @jccalhoun - 4 months
I think this has been the dream of the digital personal assistant for a long time.

When people started talking about LLMs and AI I was hoping for something that would monitor news and websites and find things that I was interested in. Something that would go beyond just keyword searches and also be able to pull in stories on radio and tv.

By @jitl - 4 months
The smaller your corpus, the harder it is to find signals to get good results. This is why even corporate intranet search is much worse than Google et al on the public internet. Personal information graphs end up being much more unusual than the average of all information online since there’s much less to average.
By @meiraleal - 4 months
Any questions about why we don't have innovation in search can only be attributed to one monopoly.
By @danjl - 4 months
Photo libraries give you search across your photos and videos. Digital asset management systems provide search across all your documents. OSX and Windows provide terrible search across your local filesystem. I would consider these personalized search engines.
By @BlackLotus89 - 4 months
This is why I have a personalized search engine... miniflux lets me search all my rss feeds, mail is searchable, logseq is searchable, everything is searchable... and you can combine it with 50 lines of python (with plugin support)
By @can16358p - 4 months
I think what Apple is building into the OSes now, combined with LLMs capable of running on mobile devices and new fine tuning techniques (that probably might not be invented yet?) will give rise to exactly this.
By @aworks - 4 months
20 years ago, Google had a custom search engine capability where you could give it a list of sites to search from. Is that functionality still around?
By @solardev - 4 months
Can't spotlight already do most of that?
By @smashtree - 4 months
I always wanted to search for random or uncommon websites, but these links are never reached.
By @mrkramer - 4 months
It's actually quite hard problem to solve; even Larry Page acknowledged that like 10 years ago[1] and nobody is even close to solving it, not even Google. My opinion is that LLMs are good step towards answer machine type of search engine. Perhaps the ultimate search engine would be something like Elon Musk's Neuralink[2] brain-computer interface where chip implant could read your thoughts and know your feelings and based on that give you "perfect" results. That would be really personal, I mean like on another level of directly personal. Now all we have is indirect personalisation where search engine gathers everything it can about you and assumes what you would want to see.

[1] https://www.youtube.com/watch?v=mArrNRWQEso

[2] https://en.wikipedia.org/wiki/Neuralink

By @KptMarchewa - 4 months
We do, it's just personalized to your cohort.
By @NoobPretender - 4 months
What about hosting your own instance of SearXNG?
By @brudgers - 4 months
A few years ago I thought about building a personal search engine. The idea was to save the HTML of every site I surfed and search it with a document engine like Apache Lucene. Because text compresses well and isn't big to begin with, a terabyte drive would last a long time and maybe forever. At the time, I thought RPi's would be a good idea because I didn't know any better. Now I might prototype it on an old Thinkpad.

Basically, it smells like a solved problem wiht open source tools built for Enterprise. I thought then and think now it could be scaled down to a hardware appliance that sits on a home network. But I am probably wrong about all of it. Good luck.

By @kerkeslager - 4 months
Kagi does some of this.
By @KaisoEnt - 4 months
This is a solution in search of a problem