July 6th, 2024

Evaluating a Decade of Hacker News Predictions: An Open-Source Approach

The blog post evaluates a decade of Hacker News predictions using LLMs and ClickHouse. Results show a 50% success rate, highlighting challenges in prediction nuances. Future plans include expanding the project. Website: https://hn-predictions.eamag.me/.

Read original articleLink Icon
Evaluating a Decade of Hacker News Predictions: An Open-Source Approach

The blog post discusses the evaluation of a decade of Hacker News predictions using an open-source approach. The author utilized LLMs to assess over 2800 predictions made by Hacker News users, focusing on 12 specific prediction threads. They employed ClickHouse for data analysis and selected the Nous Research Hermes-2-Theta-Llama-3-70B-GGUF model for evaluation, achieving structured JSON output. The author shared lessons learned, such as the efficiency of ClickHouse, model selection process, and web implementation using Skeleton.dev UI toolkit for SvelteKit. The results indicated an overall success rate of about 50%, with challenges in predicting nuances and being too conservative. The author plans to expand the project by incorporating all Hacker News comments, connecting to a prediction market, and automating replies to users for evaluation. The website displaying the predictions and statistics can be accessed at https://hn-predictions.eamag.me/.

Related

Show HN: High-frequency trading and market-making backtesting tool with examples

Show HN: High-frequency trading and market-making backtesting tool with examples

The GitHub URL leads to the "HftBacktest" project, a Rust framework for high-frequency trading. It offers detailed simulation, order book reconstruction, latency considerations, multi-asset backtesting, and live trading bot deployment.

Claude 3.5 Sonnet

Claude 3.5 Sonnet

Anthropic introduces Claude Sonnet 3.5, a fast and cost-effective large language model with new features like Artifacts. Human tests show significant improvements. Privacy and safety evaluations are conducted. Claude 3.5 Sonnet's impact on engineering and coding capabilities is explored, along with recursive self-improvement in AI development.

How I scraped 6 years of Reddit posts in JSON

How I scraped 6 years of Reddit posts in JSON

The article covers scraping 6 years of Reddit posts for self-promotion data, highlighting challenges like post limits and cutoffs. Pushshift is suggested for Reddit archives. Extracting URLs and checking website status are explained. Findings reveal 40% of sites inactive. Trends in online startups are discussed.

Getting the World Record in Hatetris (2022)

Getting the World Record in Hatetris (2022)

David and Felipe set a world record in HATETRIS, a tough Tetris version. They used Rust, MCTS, and AlphaZero concepts to enhance gameplay, achieving a score of 66 points in 2021.

Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o

Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o

The analysis of over 10,000 Hacker News comments using GPT-4o and LangChain revealed job market trends like remote work opportunities, visa sponsorship stability, and skill demands. Insights suggest potential SaaS product development.

Link Icon 4 comments
By @8organicbits - 3 months
Anyone bothered by the AI generated cover photo trend? They are always low quality, like this one with the weird font and creepy robot, and never add anything to the article. I'm guessing the goal it to boost engagement by taking up more space in any news feed that displays cover images in line? I see it as a negative quality signal.
By @smusamashah - 3 months
Interestingly, technology predictions have lower than random success rate 49% (in 1086 predictions) according to these results.
By @bogwog - 3 months
I'm disappointed that the guy with insider knowledge who predicted that Stadia would not be shutdown isn't there.

Edit: also, awesome project.

By @xyst - 3 months
Interesting project but I hate how I have to manually confirm whether or not the analysis by LLM is “hallucinated” or not.

For example, just looking at the first result in the “Top Authors” category.

The author “asah” purportedly has a “88.10% success rate, 21 predictions”. Clicking the link just points to the comment then I (the reader) have to manually look through which ones were “successful”

Were the ones that I verified as “successful” the same ones as your analysis with LLM? I can’t exactly verify that since there are no sources provided. No history of the “conversation” preserved. Instead of trusting the information the reader is left with more questions than answers. Did it source it from a reliable first party source or maybe it got the information from a PR firm overstating certain numbers/figures.

Yet another reason why LLM/AI tech is overblown in my opinion. It’s catering to the same shit as the clickbait farms. Junk information with no backing/sources. Ask these algos to cross verify sources and probably get hallucinated trash.