350M Tokens Don't Lie: Love and Hate in Hacker News
The analysis of Hacker News posts from January 2020 to June 2023 shows increased discussions on AI, a decline in COVID topics, and a generally positive sentiment with polarized reactions on various issues.
Read original articleThe analysis of Hacker News posts from January 2020 to June 2023 reveals significant insights into community engagement and sentiment. Utilizing the LLama3 70B language model, researchers examined over 100,000 posts that garnered at least 20 upvotes and five comments. The study identified the top 20 topics discussed, with a notable rise in AI and natural language processing discussions, while topics related to the COVID pandemic have declined. The sentiment analysis indicated a general positivity in comments, although no neutral discussions were found, suggesting a polarized environment. The community expressed love for programming, open source, and nostalgia for older technologies, while expressing disdain for issues like employee monitoring and police misconduct. The sentiment trends showed a modest decline over time, with notable spikes in negativity during specific events, such as Apple's proposed CSAM scanning. The findings highlight the dynamic nature of discussions on Hacker News, with some topics eliciting strong positive or negative reactions, while others remain divisive. The study underscores the effectiveness of large language models in analyzing community sentiment and engagement.
- The analysis covered Hacker News posts from January 2020 to June 2023.
- AI and natural language processing topics have surged, while COVID-related discussions have decreased.
- The sentiment analysis showed a general positivity, with no neutral discussions identified.
- The community loves programming and open source but dislikes employee monitoring and police misconduct.
- Sentiment trends indicate a modest decline over time, influenced by specific events.
Related
Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o
The analysis of over 10,000 Hacker News comments using GPT-4o and LangChain revealed job market trends like remote work opportunities, visa sponsorship stability, and skill demands. Insights suggest potential SaaS product development.
Evaluating a Decade of Hacker News Predictions: An Open-Source Approach
The blog post evaluates a decade of Hacker News predictions using LLMs and ClickHouse. Results show a 50% success rate, highlighting challenges in prediction nuances. Future plans include expanding the project. Website: https://hn-predictions.eamag.me/.
Results from Stack Overflow's Annual Developer Survey
The 2024 Stack Overflow Developer Survey shows JavaScript and PostgreSQL as top technologies, with 76% using AI tools. Developers face technical debt, but 68% report job satisfaction despite economic challenges.
Mapping Hacker News to find who knows what in the HN community
Wilson Lin's project analyzes 40 million Hacker News posts to create a semantic map, highlighting trusted voices and user relationships, while inviting feedback and participation to enhance community connections.
I love you, HN, but you're toxic (2022)
The author reflects on their Hacker News experience, appreciating its knowledge but noting a toxic atmosphere that fosters negativity, impacting personal interactions and relationships. They emphasize the importance of kindness.
- Several users question the effectiveness of LLMs compared to traditional sentiment analysis tools.
- There is a discussion about the accuracy of sentiment analysis and how it reflects true feelings versus written tokens.
- Some commenters express interest in the methodology and data sources used for the analysis.
- Concerns are raised about the representation of divisive topics and the sentiment scores assigned to them.
- Users share personal observations about the changing nature of discussions on Hacker News over time.
Gemini suggests NLTK and spaCy
I generated title, summary, keywords and hierarchical topics up to 3 levels up from the original text. My plan for now is to put them in a vector search engine, which, incidentally, was made with Sonnet 3.5 with very little iteration. I want to play around to see how I can organize my ideas with LLMs, make something useful from all that text.
I really don't know what I will discover. One small insight I already found is that summarization works really well, you can use summaries instead of full texts to prime Claude and it works better than expected. Unlimited context? Maybe.
Another direction of research is to create a nice taxonomy, there are thousands of topics, pretty difficult task, but there must be a way using clustering and LLMs. That is why I generated topic, parent-topic, gp-topic, and ggp-topic from all snippets. I would probably manually edit the top 2 levels of the taxonomy to give it the right focus.
I'm also integrating with my HN and reddit feeds. X is too stingy with the API. Maybe Pocket and local downloads folder too, I save/bookmark stuff I like. I could also include all the papers I am reading into the corpus. It could synthesize a ranked feed aligned to my own interests.
> Football (206 posts)
Either hacker news really likes the national forensic league, or these LLM-categories are a bit dubious.
Also hmmm:
> American football (7 posts)
> American_football (6 posts)
> But how do people feel about these topics
I find it notable that tokens don't necessarily express people's feelings. Put another way, tokens aren't how people feel, they're how they write.
Samstave mentioned in this thread that Twitter is a 'global sentiment engine'. I'm sure that's literally true. Sentiment measurement is only accurate to the degree that people are expressing their real feelings via tokens. I can imagine various psychological and political reasons for a discrepancy.
If you did sentiment analysis of publicly known writings of North Korean administrators, would that represent their feelings?
I think the interplay with free speech is interesting here: In a setting where people feel socially and legally safe to express their true opinion, sentiment analysis will be more accurate.
This is a cool phrase.
It is personally important as when I was asked in a panel interview @ -- They asked "what do you think Twitter is?
My response was "You're a global sentiment engine""
(There are a lot of conversations I'd love to have with the HN community with respect to our shared experiences, and weird history flipped-bits that exists in the minds of those who experienced that...
like threads of how linux came, or how xml was born through things I touched in a forrest gump way - and how there are so many stories from so many.
I started to look into it, but in the little time I had to devote to the idea, I read that the Agolia API lets you look over a longer period, but that it is relatively costly.
I just want to look for all story titles from the beginning of time which match one of several simple search terms, and return submission date and title for an analysis I'd conduct in R.
Am I overthinking it and a simple Python script without an API code can do it?
Great work folks, glad we can all agree on that one.
Interesting that they used an LLM for this. I mean it makes sense and the data seems to pass the pub test but I, in my ignorance, would not have assumed that a language model would be well suited for number crunching.
And no 5s? What is even going on in that LLM?
LLM's are really sensitive to bad or even slightly ambiguous grammar. I wonder if the numbers would differ significantly with "Reply only with the tags, in the following format".
> 350M Tokens Don't Lie: Love And Hate In Hacker News, to
> LLM-based sentiment analysis of Hacker News posts, to
> LLM-based sentiment analysis of Hacker News posts between Jan 2020 and June 2023
I was horrified when I read international students as one of top on the hate list. Although I saw a couple of comments attributed their cities housing crises on international students and thought that this sentiment is wide supported.
SENTIMENT 6
:D
For context, I'm someone who uses HN to search for topics I'm interested in, rather than something like Google or Reddit.
- For anything SF community-related, most hits are from 10+ years ago. Lots of "hey we have a space in soma, any local startups want to hang and drink beers?" or "we have an empty desk in a space in the mission, any hackers want to grab it for free?" - all from around 2012 or prior. Nothing like that seems to happen anymore.
- Starting from around 2016, a heavy anti-technology sentiment appears. Cloud, crypto, AI - all are nonsense propagated by VC types and overzealous engineers.
- Similarly, any thread involving money/labor invariably has an anti-capitalist and/or "unions would solve everything" tangent.
Would be interested to hear if others have observed similar.
Related
Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o
The analysis of over 10,000 Hacker News comments using GPT-4o and LangChain revealed job market trends like remote work opportunities, visa sponsorship stability, and skill demands. Insights suggest potential SaaS product development.
Evaluating a Decade of Hacker News Predictions: An Open-Source Approach
The blog post evaluates a decade of Hacker News predictions using LLMs and ClickHouse. Results show a 50% success rate, highlighting challenges in prediction nuances. Future plans include expanding the project. Website: https://hn-predictions.eamag.me/.
Results from Stack Overflow's Annual Developer Survey
The 2024 Stack Overflow Developer Survey shows JavaScript and PostgreSQL as top technologies, with 76% using AI tools. Developers face technical debt, but 68% report job satisfaction despite economic challenges.
Mapping Hacker News to find who knows what in the HN community
Wilson Lin's project analyzes 40 million Hacker News posts to create a semantic map, highlighting trusted voices and user relationships, while inviting feedback and participation to enhance community connections.
I love you, HN, but you're toxic (2022)
The author reflects on their Hacker News experience, appreciating its knowledge but noting a toxic atmosphere that fosters negativity, impacting personal interactions and relationships. They emphasize the importance of kindness.