July 25th, 2024

Mapping Hacker News to find who knows what in the HN community

Wilson Lin's project analyzes 40 million Hacker News posts to create a semantic map, highlighting trusted voices and user relationships, while inviting feedback and participation to enhance community connections.

Read original article

CuriositySkepticismEnthusiasm

Mapping Hacker News to find who knows what in the HN community

Wilson Lin discusses a project involving the analysis of 40 million posts and comments from Hacker News to create a semantic map of the community. This initiative aims to identify and highlight the trusted voices within the network, emphasizing the importance of people over content in social networks. Lin collaborated with Robert, who has experience in social semantic algorithms, to explore how to better understand the knowledge and relationships among users. The project allows users to see their contributions and unique linguistic identities within the community, as well as to search for expertise on various topics such as startups, programming languages, and neuroscience. The technology developed can organize user semantics, facilitate searches based on knowledge, and map community relationships, thereby revealing the expertise of individuals rather than just the information they produce. Lin invites feedback and encourages interested individuals to join a waitlist to further engage with the project, which aims to enhance connections among users based on their knowledge and interests rather than merely organizing information.

Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o

The analysis of over 10,000 Hacker News comments using GPT-4o and LangChain revealed job market trends like remote work opportunities, visa sponsorship stability, and skill demands. Insights suggest potential SaaS product development.

Evaluating a Decade of Hacker News Predictions: An Open-Source Approach

The blog post evaluates a decade of Hacker News predictions using LLMs and ClickHouse. Results show a 50% success rate, highlighting challenges in prediction nuances. Future plans include expanding the project. Website: https://hn-predictions.eamag.me/.

Show HN: 40M embeddings to find who knows what on HN

Wilson Lin and Robert embed 40 million Hacker News posts to create a semantic map, prioritizing individuals over content. They aim to highlight trusted voices and knowledge expertise within the community.

AI: What people are saying

The comments on Wilson Lin's project reveal a mix of intrigue and skepticism regarding the semantic mapping of Hacker News posts.

Many users appreciate the innovative approach and visualization of user expertise.
Concerns arise about the potential for misidentifying "trusted voices" and the implications of algorithmic influence on social interactions.
Some commenters express doubts about the accuracy and relevance of the tool, particularly for less active users.
There are discussions about privacy and the risks of exposing personal information through such analyses.
Several users highlight the importance of content over individual user reputation in discussions.

60 comments

By @wonger_ - 9 months

Personally, I like how HN focuses on content and discussions rather than individual users. If I wanted to follow experts, I'd probably curate a selection on a social network like Mastodon, or kludge together some RSS feeds.

Also, I feel like this tool selects for active commenters, not for knowledgeable experts. Not to mention throwaway accounts.

Still a cool project.

By @simonw - 9 months

https://hn2.wilsonl.in/user/simonw includes "Risk of COVID from pianos" down at the bottom of the map. I'd love to know where that came from!

By @c0l0 - 9 months

https://hn2.wilsonl.in/user/c0l0 - Apparently, I'm a (the?) leading expert on Optimizing Toilet Lid Design - AMA.

By @dang - 9 months

Recent and related:

Show HN: Exploring HN by mapping and analyzing 40M posts and comments for fun - https://news.ycombinator.com/item?id=40307519 - May 2024 (159 comments)

By @fabianholzer - 9 months

So, you I assume you extracted my mail out of the profile text (which admittedly was not obscured enough for the LLMs of today), to put it into a mailto: link - Well, many thanks on behalf of the low-effort spammers for making harvesting easier, I guess...

By @chatmasta - 9 months

I remember there was a (rather controversial) tool posted here a few years ago, which used textual analysis and stylometry to find “similar users.” So you could type in someone’s username and find their likely alt accounts. It was creepily accurate - or at least that’s what I heard from a friend who has an alt :)

Could this tool be repurposed for that? Presumably the “map” rendered in each user’s avatar could be encoded as a vector and then compared to that of another user.

EDIT: Wait, I just realized it already does this… (or at least I think so - it’s not immediately obvious if “Explore More Users” is ranked by similarity.)

By @phaedrus - 9 months

I chose my username from the narrator's alter ego in Zen and the Art of Motorcycle Maintenance. In reference to the analytical knife:

"Phædrus was a master with this knife, and used it with dexterity and a sense of power. With a single stroke of analytic thought he split the whole world into parts of his own choosing, split the parts and split the fragments of the parts, finer and finer and finer until he had reduced it to what he wanted it to be. Even the special use of the terms "classic" and "romantic" are examples of his knifemanship."

In a bit of nominative determinism, or perhaps just having chosen the name because I know myself (or maybe I just over use these words), my keywords include: "part, system, level, language, article, object," etc.

By @mmastrac - 9 months

I found it a bit challenging to actually drill down into my own username, but it doesn't seem to offer much other than throwing a lot of dots all over the map. I'm trying to understand what the overall clusters might be, but most of them are just android/apple/google?

https://hn2.wilsonl.in/user/mmastrac

By @gnicholas - 9 months

Interesting. I have noticed that my most-upvoted comments relate to legal questions, which I have relatively more expertise than most HNers (I used to be a lawyer). Although I'm not among the top commenters for law/legal/lawyer according to this tool, I definitely recognize some of the top names and can recall seeing their comments in legal threads. Pretty cool tool!

By @RamblingCTO - 9 months

The best thing about HN is that comments feel pretty temporary. I don't like the fact that I'm being analysed without my consent and put on public display. Not that there's anything remotely interesting about me on that page but it feels weird. Not everything needs to be analysed and we don't need to compete everywhere.

What I'm saying is: I like the focus on the content and that it's not about who said it.

It got me to remove my twitter handle from my bio though. If you could update that in your app I would be thankful.

By @QuesnayJr - 9 months

Interesting, but

  Rather than organizing the world's information, what if we could organize the world's people?

I do not wish to be organized.

By @ImageXav - 9 months

As a not so active user, this tool is rather inaccurate. It seems to have focussed on the one question I asked about jpeg xl, which is the topic I know the least about.

I suspect a bias towards more common topics might be occurring.

By @dmurray - 9 months

Things I thought I had expertise in and have posted about disproportionately on HN: chess, Python, HFT, Fermi estimation, Ireland.

Keywords the site actually associates with me: language, English, article, team, book. To be fair, at some zoom levels I do get "chess".

By @SushiHippie - 9 months

What's up with dangs profile?

https://hn2.wilsonl.in/user/dang

Why does it say "Marion Milner" and why are there only so few posts on the map?

By @dep_b - 9 months

"Tax avoidance schemes in The Netherlands"

https://hn2.wilsonl.in/user/dep_b

I guess that's why I still like posting anonymously?

By @benreesman - 9 months

This implementation doesn’t really get there, but…

I think this is an extremely cool idea both on HN specifically and generally on the Internet. Bluesky does a bit of the thing where you can mix and match your content to your ranker/recommender system.

I hope you folks keep working on it, this is a refreshingly cool hack in the space.

By @kaycebasques - 9 months

Super cool! Happy to see that I show up under documentation-related stuff, considering how frequently I mention that I'm a technical writer in my comments. Also pretty excited to connect with the other people that show up related to docs / technical writing / etc.

By @photochemsyn - 9 months

> "Despite the intervening 16 years, we're amazed that social networks, even Hacker News, don't compute and display the trusted voices across topics. Instead of prioritizing pages based on content, social networks could prioritize the people behind the content."

Fallacy: appeal to authority. Practically, just because someone generates great content on subject A doesn't mean their take on subject B is any better than random. A well-reasoned self-consistent argument informed by accurate data is far more valuble than 'trust this expert opinion because this expert can be trusted' approaches - although it may require more work on the part of the reader. Don't get lazy.

By @zamalek - 9 months

Some sentiment analysis might help? Apple is one of my keywords, but you'd pretty-much only get an "avoid" response from me (to a degree, I have recommended their phones to a very specific demographic).

Awesome project, fantastic UI.

By @robg - 9 months

As Wilson and I are working on this project, we'd love to hear your thoughts!

By @pcthrowaway - 9 months

How does determine which comments are valuable ("trusted") without access to the vote count of comments?

By @junon - 9 months

Would be interested in a study, unfeasible as it may be, on the social compatibility of people who have the strongest overlap between themselves and the next closest user.

Cool visualization and analysis, really well made!

By @Aachen - 9 months

What instructions did you give it for the bio? It tries to decode a rot13 email address (and gets to a recognisable point) but then says "no bio provided". I can't figure out what it's trying to do

Also, thanks a lot for attempting to put it in decoded form on another website. If I now get spam on that address, at least I know whose idea that was and that it's not yet spammers who got this clever, but rather it's due to well-meaning hackers with an idea: the most dangerous kind! :P

By @batch12 - 9 months

When I did this, I found it to also be an interesting way to fingerprint users and find alternate accounts. I was able to match an old account I used in a top 10 similarity match out of all users.

By @causal - 9 months

This is really cool. Also a healthy reminder that anything we post publicly is likely to be analyzed. With a little analysis it's probably easy to know me better than I know myself.

By @itronitron - 9 months

while I can appreciate a fair amount of work went into this, it's not providing any useful information on users, or HN generally, that you couldn't more easily reach by searching by username on hn.algolia.com

By @muzani - 9 months

Somehow I'm #10 on startups. https://hn2.wilsonl.in/search/Startups

I guess having over a decade in startups counts for something, but it's crazy to score higher than sama or all the other founder/investors here who make 1000000x more from startups.

My favourite terms here are apparently "inclusive pregnancy emoji proposal", "Controversial tweet on China issue", and "enhancing pasta flavor with salt". I do remember all of these conversations, it's interesting to see it brought up.

Anyway, I'm gonna bookmark this for the next time I look for a job on HN lol

By @tzury - 9 months

Looked at mine. Says not much at all about “what I know”.

Checked RIP Aaron Swartz. Nothing meaningful. What am I missing?

https://hn2.wilsonl.in/user/aaronsw

By @tgv - 9 months

I tried a few topics, e.g. neuroscience and language (a minor variation of the query in the blog), and the results are rather useless. It certainly doesn't show users with knowledge of the topic. Perhaps too fringe?

By @DoreenMichele - 9 months

Just curious what the mechanism is for adding a bio to individual users. Your sample -- robg -- has the same bio on your stuff as on HN but others have stuff on HN and "No bio provided" on your thing.

By @Daub - 9 months

Seeing my domain knowledge mapped out so honestly, I am reminded of the line in Robert Burns's poem "To a Louse":

Oh, would some Power the gift give us / To see ourselves as others see us!

By @mattdesl - 9 months

For the “terrain contours” is there something specific you’ve done to make it feel more cartographic? Or is it basically just marching cubes / iso lines on some data points?

Looks fantastic. Very cool project.

By @asp_hornet - 9 months

I hate this. If i used my real name id have some pleb website on the internet making chump guesses at what i know based on what i was prepared to share. Thanks for reminding me to use anonymous burner accounts on everything app i use.

By @robwwilliams - 9 months

So cool. Love seeing this novel pivot of HN data. Just confirmed that my comments are well embedded by testing a few topics (longevity and mouse biology). Privacy issues not a concern for most academic researchers (we typically beg for PR), but I can see that this might contribute to more cautious wording.

I left Twitter right before it became X, and focused on HN as a “breaking news and commentary” that is mercifully free of politics.

This tool fills a discovery gap.

By @bawolff - 9 months

When i looked up my name, the word "argument" shows up...i was kind of hoping i knew more than just how to argue ;)

In any case, interesting idea & project. Philosophically i'm not sure i like the idea of identifying experts - i'd much rather people's comments stand on their own instead of their clout, but nonetheless definitely interesting.

By @neilv - 9 months

> Mapping Hacker News to find who knows what in the HN community

Would things like this create/increase incentives to game the metrics?

By @nottorp - 9 months

Is it who knows what? Or is it what's pushing whose buttons?

I looked myself up and "google" is proeminent. However I'm only posting anti google comments, nothing technical but on the privacy theme...

I also looked up 'yocto', which is something i know something about and i mentioned in posts a couple times, and the first user returned has some very interesting tags:

gur juvpu ohg guvax evtug qbrf

And the only really related tag i see is 'buildroot'. I guess it's just not a popular enough tag for the machine to have enough data.

Edit: and to join the choir of concerned voices:

1. It's not 'trusted'. It's at best 'popular'.

2. It reminds me of the main social networks and 'engagement'. Hope HN never becomes as predatory.

By @moconnor - 9 months

Searched for bingo and didn't see patio11 in the results; hmm...

By @auggierose - 9 months

It puts me in the top three for "theorem proving", and at the top for "interactive theorem proving".

I like it. I feel this tool is quite sophisticated and incredibly accurate.

By @solardev - 9 months

In your avatar, what do the size and positions of the circles represent? It's not super obvious to me how that correlates to the word cloud on the map (if it does).

By @rob - 9 months

https://hn2.wilsonl.in/user/rob

Seems pretty accurate. I do love me some PHP and WordPress.

By @smusamashah - 9 months

Why is dang empty https://hn2.wilsonl.in/user/dang

By @chirau - 9 months

What does the solid red dot for a username mean? Some have a green one and some do not have anything.

By @motohagiography - 9 months

Good God I am a bore. May track some people down in my niche interest areas though, thank you for this.

By @rbanffy - 9 months

I find it surprising that I talk more about my hobbies than my work.

I guess I should apply for a job at ESA.

By @dechenham - 9 months

This vindicates my decision to comment using only short-lived burner accounts.

By @wcedmisten - 9 months

Wow I really like the topographic visualization of the data, looks really slick!

By @rpmisms - 9 months

I wonder how high I am on the guns knowledge list?

By @_ink_ - 9 months

I like how I am all over the place, haha.

By @sebastiennight - 9 months

I can already see the day coming when someone is going to make a phpBB-like HN skin which displays everyone's real name and LinkedIn picture along their comments.

Why not add their Gmail signature as well?

By @0xbadcafebee - 9 months

HN and other forums perpetuate a fallacy (that I assume isn't well known because nobody has published a research paper on it)... but "the wisdom of the crowd" isn't.

"Trusted voices" on HN are often not experts, yet espouse views which are not what the larger body of experts would consider correct. In addition, actual expert voices are drowned out by whatever the popular position is. HN also provides its own cultural bias, in that everything spoken on HN has to follow a rigorous cultural sieve set down by the guidelines, such that a "negative" view, even if correct, is considered either wrong or distasteful and buried. This is exacerbated by banning the use of humor to disarm controversial or heated comments. And this is the comments that HN does get; many opinions are never entered as comments here, so there exists a large knowledge gap. Then there's the "taboo" subjects like race, gender, religion, politics, social justice, etc which get buried for fear of controversy, so you're definitely not gonna find any expert opinions on those, as the stories just aren't there for discussion.

The end result is that often experts go unheeded or even downvoted, popular shallow opinions get upvoted, and substantive commentary based on evidence and experience is frequently missing. The fact is that we have no idea who knows what, or what's true or right. We just have "popularity" according to the particular cultural quirks of this site. So you can definitely find out "who thinks what", and who is considered to be more trustworthy to a HNer, but it has absolutely nothing to do with objective truth or the body of real knowledge that exists outside the world of HN comments. This is an echo chamber, but it's not a chamber of experts. It just seems that way because occasionally you see a minor tech celebrity, and people talk with absolute authority regardless of if they have any.

You want to find out who knows what? Look at their diplomas and careers. If they've done 20 years in a single field, probably they're an expert. If they have a degree (or multiple) in a field, probably they're an expert. If they spent half their life working on a single hobby, they're at least very knowledgeable in that field. But you can't determine that just by looking at who's talking about what or how many completely subjective "points" they get for what they say. Determining real knowledge requires analysis of specific criteria, filtered to get a higher quality result.

By @qdot76367 - 9 months

Searched for "Buttplugs".

My name pops up.

Hell yeah. A++++ completely accurate would use again.

By @shinryuu - 9 months

Searching for neuropathy gives me results for psychopathy. Not exactly the same ...

By @precompute - 9 months

Love it. Visited my profile, wasn't disappointed. 10/10

By @hu3 - 9 months

Searched for "dupe" and yes, it correctly pointed out the one user that takes it a hobby:

https://hn2.wilsonl.in/search/dupe

By @ajkjk - 9 months

> Despite the intervening 16 years, we're amazed that social networks, even Hacker News, don't compute and display the trusted voices across topics

Please no. That sounds dystopian. We should not prefer having algorithms meddling with social interaction. We should not want things better designed to manipulate us.

By @wruza - 9 months

May talk about social connections and knowledge networks etc, but this will eventually end up as a saas for everyones future HR. Can’t see much amazing here, especially with this founder’s vibe in the replies.

I guess it’s inevitable at this point, but how does it feel to be among the first who dumb down real people to a set of caricature keywords based on an ml method of the day?

Mapping Hacker News to find who knows what in the HN community

Related

Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o

Evaluating a Decade of Hacker News Predictions: An Open-Source Approach

Show HN: 40M embeddings to find who knows what on HN

Related

Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o

Evaluating a Decade of Hacker News Predictions: An Open-Source Approach

Show HN: 40M embeddings to find who knows what on HN