Insights from over 10,000 comments on "Ask HN: Who Is Hiring" using GPT-4o
The analysis of over 10,000 Hacker News comments using GPT-4o and LangChain revealed job market trends like remote work opportunities, visa sponsorship stability, and skill demands. Insights suggest potential SaaS product development.
Read original articleThe article discusses insights gained from analyzing over 10,000 comments on Hacker News' "Ask HN: Who Is Hiring" using GPT-4o and LangChain. The author aimed to understand the current job market and trends by structuring comments and classifying job postings. The process involved scraping comments, storing them in a database, and using LangChain's features for analysis. Results indicated trends such as the prevalence of remote work opportunities, stability in visa sponsorship, and the demand for specific skills like experience levels, locations, databases, and JavaScript frameworks. The author also shared learnings from the project, suggesting improvements in model field descriptions and categorization. Additionally, the article hints at the potential for building a SaaS product to match user-described job preferences with categorized comments. The analysis provided valuable insights into the job market dynamics and trends, showcasing the capabilities of combining language models and data science methods for quick understanding of various topics.
NYC clearly doesn’t have the level of activity in this area that the Bay does, but there’s a scene. LeCun and the NYU crowd and the big FAIR footprint create a certain gravity. There’s stuff going on :)
I can actually see how nice it would be for an llm to be able to disambiguate 'go' and 'rust'. However, it does seem a bit disappointing that it isn't consolidating node.js and nodejs or react-native and react native.
I'm curious on the need to do use selenium script to google to iterate, here's my script: https://gist.github.com/bobbywilson0/49e4728e539c726e921c79f.... Just uses the api directly and a regex for matching the title.
Thanks for sharing!
Anyway, I have a tangential question, and this is the first time I see langchain, so may be a stupid one. The point is the vendor-API seems to be far less uniform than what I'd expect from a framework like this. I'm wondering, why cannot[0] this be done with Ollama? Isn't it ultimately just system prompt, user input and a few additional params like temperature all these APIs require as an input? I'm a bit lost in this chain of wrappers around other wrappers, especially when we are talking about services that host many models themselves (like together.xyz), and I don't even fully get the role langchain plays here. I mean, in the end, all that any of these models does is just repeatedly guessing the next token, isn't it? So there may be a difference on the very low-level, there my be some difference on a high level (considering different ways these models have been trained? I have no idea), but on some "mid-level" isn't all of this utlimately just the same thing? Why are these wrappers so diverse and so complicated then?
Is there some more novice-friendly tutorial explaining these concepts?
[0] https://python.langchain.com/v0.2/docs/integrations/chat/
LLMs aren't reliable for actual number crunching though (at least from what I've seen). Which is why I really appreciate seeing this blend!
As an added detail regarding the "remote" v "in-person", another interesting statistic, to me, is to know how many of those in-person job-seeking companies are repeats! It could absolutely mean they're growing rapidly, OR it could mean they're having trouble finding candidates. Equally, missing remotes could mean either they're getting who they need OR they're going out of business.
All interesting plots on the graph!
I’d love to see a similar analysis to “Who Wants to be Hired”. What trends exist in folks struggling to find work? That can help point people to how to target their career growth.
Been there. Hackers will do what hackers will do.
Someone in NYC give this person a job! ^_^
Also seems a bit silly to include the JS runtime itself in comparison with the JS frameworks. In fact it is almost 100% of the time used along with a front-end framework/library at least on a build-level. I would not put a ton of trust in that frameworks demand data ;)
"Node.js" and "NodeJS" are drawn as separate spheres, is that intentional? Similarly for "Vue.js" and "VueJS". (Maybe the same for "Angular" vs "AngularJS" too, although the former might be referring to Angular TypeScript).
I only used OpenAI to determine location and then some (failed) analysis of candidate’s HN comments. The rest of the tools I wrote didn’t need LLMs (just string matching).
Nit: There appears to be both MongoDB and Mongo in the database graph.
Also, could you do a write up on how you actually used GPT-4o to assist you in this exploration? Or share some of the main prompts you used? Thanks!
How we can have a world where this statement exists, and anyone is bullish about LLMs?
The lack of precision is appalling. I can see how a gpt-4o can be fun for toy projects, but can you really build meaningful real-world applications with this nonsense?
FYI, you could've just used the hackernews API, and get all posts by the user `whoishiring`, which submits all these who is hiring posts. And then filter out only the posts where the title starts with "Ask HN: Who is hiring?", as this bot also submits the 'Who wants to be hired?' and 'Freelancer? Seeking Freelancer?' posts.
https://hacker-news.firebaseio.com/v0/user/whoishiring.json?...
I seem to have noticed that Rust has gotten more common than golang in the job postings, but it's hard to verify without code because there are so many false positives for "go" on any post.