July 4th, 2024

Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o

The analysis of over 10,000 Hacker News comments using GPT-4o and LangChain revealed job market trends like remote work opportunities, visa sponsorship stability, and skill demands. Insights suggest potential SaaS product development.

Read original articleLink Icon
Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o

The article discusses insights gained from analyzing over 10,000 comments on Hacker News' "Ask HN: Who Is Hiring" using GPT-4o and LangChain. The author aimed to understand the current job market and trends by structuring comments and classifying job postings. The process involved scraping comments, storing them in a database, and using LangChain's features for analysis. Results indicated trends such as the prevalence of remote work opportunities, stability in visa sponsorship, and the demand for specific skills like experience levels, locations, databases, and JavaScript frameworks. The author also shared learnings from the project, suggesting improvements in model field descriptions and categorization. Additionally, the article hints at the potential for building a SaaS product to match user-described job preferences with categorized comments. The analysis provided valuable insights into the job market dynamics and trends, showcasing the capabilities of combining language models and data science methods for quick understanding of various topics.

Link Icon 42 comments
By @benreesman - 7 months
If Tamer is reading this, I know of opportunities in NYC for sharp ML people. Feel free to drop me a line at b7r6@b7r6.net and I’ll be more than happy to make an introduction or two.

NYC clearly doesn’t have the level of activity in this area that the Bay does, but there’s a scene. LeCun and the NYU crowd and the big FAIR footprint create a certain gravity. There’s stuff going on :)

By @bobbywilson0 - 7 months
Cool analysis with GPT-4o! I was doing some messing around with the same dataset recently around the "Who is Hiring" and "Who wants to be hired". Although I was just using pandas and spacy. (I was job supply and demand with the US FED interest rates here: https://raw.githubusercontent.com/bobbywilson0/hn-whos-hirin...)

I can actually see how nice it would be for an llm to be able to disambiguate 'go' and 'rust'. However, it does seem a bit disappointing that it isn't consolidating node.js and nodejs or react-native and react native.

I'm curious on the need to do use selenium script to google to iterate, here's my script: https://gist.github.com/bobbywilson0/49e4728e539c726e921c79f.... Just uses the api directly and a regex for matching the title.

Thanks for sharing!

By @krick - 7 months
I'm more interested in technical side of this, but I'm not seeing any links to GitHub with the source code of this project.

Anyway, I have a tangential question, and this is the first time I see langchain, so may be a stupid one. The point is the vendor-API seems to be far less uniform than what I'd expect from a framework like this. I'm wondering, why cannot[0] this be done with Ollama? Isn't it ultimately just system prompt, user input and a few additional params like temperature all these APIs require as an input? I'm a bit lost in this chain of wrappers around other wrappers, especially when we are talking about services that host many models themselves (like together.xyz), and I don't even fully get the role langchain plays here. I mean, in the end, all that any of these models does is just repeatedly guessing the next token, isn't it? So there may be a difference on the very low-level, there my be some difference on a high level (considering different ways these models have been trained? I have no idea), but on some "mid-level" isn't all of this utlimately just the same thing? Why are these wrappers so diverse and so complicated then?

Is there some more novice-friendly tutorial explaining these concepts?

[0] https://python.langchain.com/v0.2/docs/integrations/chat/

By @smj-edison - 7 months
This seems like a great blend of LLM and classic analysis. I've recently started thinking that LLMs (or other ML models) would be fantastic at being the interface between humans and computers. LLMs get human nuance/satire/idioms in a way that other NLP approaches have really struggled with. This piece highlights how ML is great at extracting information in context (being able to tell whether it's go the language or go the word).

LLMs aren't reliable for actual number crunching though (at least from what I've seen). Which is why I really appreciate seeing this blend!

By @t-writescode - 7 months
This is very neat! Thanks for using your time and literal dollars to work through this!

As an added detail regarding the "remote" v "in-person", another interesting statistic, to me, is to know how many of those in-person job-seeking companies are repeats! It could absolutely mean they're growing rapidly, OR it could mean they're having trouble finding candidates. Equally, missing remotes could mean either they're getting who they need OR they're going out of business.

All interesting plots on the graph!

By @BadCookie - 7 months
Interesting data, but I think the percentage of remote listings is misleading. Many “remote” jobs now require you to live within commuting distance of a particular city, usually SF or NY.
By @blowski - 7 months
I wonder how this would compare against a random sample of jobs on, say, Indeed or LinkedIn. My experience of Hacker News is that it’s a very biased group (in a good way) to the general industry.
By @gwd - 7 months
Nit: There appears to be both a "React Native" and a "React-Native" bubble in the JS framework graph.
By @softwaredoug - 7 months
Really cool.

I’d love to see a similar analysis to “Who Wants to be Hired”. What trends exist in folks struggling to find work? That can help point people to how to target their career growth.

By @foolswisdom - 7 months
I think it's bad to stack bars in a graph, because it means you can't properly gauge the second layer (for example, the remote jobs quantity, at the beginning of the results section). Better to have two bars side by side (for each timestamp), one for remote, and another for not remote.
By @thekevan - 7 months
"...so the only way to make this diagram not look ridiculous would be to use a logarithmic scale...as I have just realized some minutes earlier. Instead I’ve spent two hours to build a bubble-chart in 300 lines of code with three.js"

Been there. Hackers will do what hackers will do.

By @herodoturtle - 7 months
What a great write up - thank you.

Someone in NYC give this person a job! ^_^

By @okokwhatever - 7 months
So, 50+ bucks to learn React+Postgres are good enough to find a job, cool.
By @simonw - 7 months
It would be interesting to run this same analysis using Claude 3 Haiku, which is 1/40th of the price of GPT-4o. My hunch is that the results would be very similar for a fraction of the price.
By @noduerme - 7 months
I'm just waiting out React, using my own much cleaner SPA framework for the last 10 years that's only about 5k LOC and avoids a lot of debouncing and fragmentation issues. From that three.js diagram, it looks like React should turn into a red giant at any moment. Can't wait.
By @gloosx - 7 months
It is not comprehensive to compare actual JS frameworks and UI/state management libraries. Obviously React will eclipse the frameworks since it is a simple interface library used by the frameworks themselves (like Next.js or React Native)

Also seems a bit silly to include the JS runtime itself in comparison with the JS frameworks. In fact it is almost 100% of the time used along with a front-end framework/library at least on a build-level. I would not put a ton of trust in that frameworks demand data ;)

By @omoikane - 7 months
> https://tamerc.com/posts/ask-hn-who-is-hiring/#what-javascri...

"Node.js" and "NodeJS" are drawn as separate spheres, is that intentional? Similarly for "Vue.js" and "VueJS". (Maybe the same for "Angular" vs "AngularJS" too, although the former might be referring to Angular TypeScript).

By @maliker - 7 months
Beautiful analysis! Great to see the hard stats on the technology breakdowns on the hiring threads with a clever LLM approach. And the write up was super clear.
By @joshstrange - 7 months
Very interesting, I did something similar but in reverse, looking at who wants to be hired.

I only used OpenAI to determine location and then some (failed) analysis of candidate’s HN comments. The rest of the tools I wrote didn’t need LLMs (just string matching).

By @spdustin - 7 months
One of the reasons function calling seems to need better parameter names and descriptions is sneaky: if your function parameters are nested, descriptions aren't actually seen by the LLM. It only sees the names.
By @bastien2 - 7 months
The irony of using a chatbot to gain insight into the current problems with the job market
By @wklm - 7 months
Great analysis! The only thing I'd suggest changing is how the salary distribution is represented—a simple histogram might present it more effectively. Also, would you consider publishing the dataset you scraped?
By @frays - 7 months
Great post, thanks for sharing.

Nit: There appears to be both MongoDB and Mongo in the database graph.

Also, could you do a write up on how you actually used GPT-4o to assist you in this exploration? Or share some of the main prompts you used? Thanks!

By @l5870uoo9y - 7 months
Surprised to see Redux featured so prominent in JS frameworks section, since it is so often criticized while many praise newer competitors like Zustand.
By @hughes - 7 months
Given the high price due to high token count, I wonder how different the results would be running the same analysis with a local model.
By @rareitem - 7 months
I really like the graph at the 'what javascript frameworks are in demand?' part. Any insight to share about that?
By @Rebuff5007 - 7 months
> HAS TO BE A BOOLEAN VALUE!

How we can have a world where this statement exists, and anyone is bullish about LLMs?

The lack of precision is appalling. I can see how a gpt-4o can be fun for toy projects, but can you really build meaningful real-world applications with this nonsense?

By @jddj - 7 months
Nice charts, which lib? I'm on mobile currently and can't easily dig any deeper to answer my own question
By @alberth - 7 months
Is 10,000 comments a large enough sample size to gain insights from?
By @lukasgw97 - 7 months
Crazy, didn't know that React is so important! Well done!
By @stuckkeys - 7 months
Very cool idea indeed. Love the creativity level.
By @cinemavzbj - 6 months
Hi
By @rickcarlino - 7 months
Nice work, OP. Looking at the graph, I sure hope we did not hit "Peak HN" in Q2 2023.
By @N8works - 7 months
If you believe any "official data" isn't being politically gamed, I have a bridge to sell you.
By @SushiHippie - 7 months
> Using Selenium, I used a script to google iteratively for strings query = f"ask hn who is hiring {month} {year}" to get the IDs of the items that represent the monthly threads.

FYI, you could've just used the hackernews API, and get all posts by the user `whoishiring`, which submits all these who is hiring posts. And then filter out only the posts where the title starts with "Ask HN: Who is hiring?", as this bot also submits the 'Who wants to be hired?' and 'Freelancer? Seeking Freelancer?' posts.

https://hacker-news.firebaseio.com/v0/user/whoishiring.json?...

By @jamestimmins - 7 months
Any chance you could run a comparison of the number of Rust vs Golang jobs over time?

I seem to have noticed that Rust has gotten more common than golang in the job postings, but it's hard to verify without code because there are so many false positives for "go" on any post.

By @gajnadsgjoas - 7 months
This should have "Show HN" tag Also you don't have to use selenium or HN api - there are DBs with updated HN data