Google rejected me and now I'm building a search engine
The article recounts a rejection from Google during an interview, prompting the individual to create a non-profit, community-driven search engine emphasizing ethical values over profit, welcoming contributions for development.
Read original articleThe article discusses a personal experience of being rejected by Google during an interview, leading the individual to decide to build their own search engine as an alternative to Google. The rejection occurred despite positive feedback from three interviewers, ultimately resulting in the decision not to hire the individual. The rejection prompted the individual to criticize Google's practices, accusing the company of unethical behavior such as misleading advertising, tax evasion, privacy violations, and supporting controversial causes. The individual's new search engine project aims to be non-profit and community-driven, allowing users to contribute to the search engine's development and funding. The project is open source, welcoming contributions from anyone interested, and emphasizes collaboration to create a search engine that prioritizes ethical values over profit. The individual acknowledges the long road ahead to compete with Google but remains optimistic about the potential for a community-driven alternative.
Related
Surfing the (Human-Made) Internet
The internet's evolution prompts a return to its human side, advocating for personal sites, niche content, and self-hosted platforms. Strategies include exploring blogrolls, creating link directories, and using alternative search engines. Embrace decentralized social media and RSS feeds for enriched online experiences.
Perplexity's Grand Theft AI
Perplexity, a search engine rivaling Google, faces criticism for being a middleman that undermines original sources' revenue by summarizing content unethically. The CEO's deceptive practices raise concerns about trust and integrity.
Perplexity's Grand Theft AI
Perplexity, a search engine rivaling Google, faces criticism for bypassing original sources, dodging paywalls, and promoting unethical behavior. The CEO's defense raises concerns about trust and integrity online.
Waves of Writing for Google
The article explores the evolution of writing for Google, highlighting shifts from keyword stuffing to user-focused content and AI's impact on writing jobs. Writers are advised to adapt, focus on personal branding, and embrace technology for relevance.
Google has been lying about their search results [video]
A leak from Google's GitHub shows the search algorithm tracks user clicks and time on pages, raising concerns about search result accuracy, treatment of smaller websites, and SEO strategies.
In my experience the only reason you should say "I don't know" is if you're going to follow it with "but if I had to guess" or similar. Sounds like the interviewer definitely came on strong but being able to ace the psychological part of an interview is often as important or more important than the actual solution.
https://news.ycombinator.com/item?id=40850725
Rather than this clickbaity "Google rejected me" story about something that happened 15 years ago, here's a link to the actual project:
This strikes me as fairly petty “I didn’t answer wrong, you asked me the wrong questions!”. Honestly it’s the recruiting process working as intended - folks with this type of attitude don’t make good team members in my experience.
Also > At the time “Don’t be evil” still meant something. Now it seems like their mantra is just “Be evil”.
Seems really petty. It’s a shame because we could good big tech alternatives, but building something out of spite without much perspective is unlikely to create a good alternative.
The actual link from it says the rankings are, like everywhere else:
> To train a learning to rank model. No matter how many queries are manually curated, most user queries will be organic because of the natural diversity of user queries. Curation is still important for these results since it impacts the machine learning model that will be trained on the curated rankings.
so this not true in the long term.
I remarked that in the circumstances I'd need to know, that I'd google it and check the documentation to make sure I got it right.
The interviewer (who I later found out was the founder/CEO) absolutely laid into me for that answer, saying if he wanted people to google that a "thousand Indians graduating in computer science every day" could google it.
I tried to argue that I was looking to be employed for my problem solving skills and experience rather than rote knowledge, but he was really angry. He literally said to be verbatim, "Let me give you some interview advice, NEVER tell an interviewer you'd google something". He also made a mildly off-colour remark that if he "wanted someone just to google, [he] could hire one of thousands of fresh graduates coming out of India".
It was an experience so bad that it inspired me to create a glassdoor account just to leave negative feedback, something I've never done before or since. The recruiter was absolutely pissed, and still doesn't provide me leads, which is kind of annoying since he's the most active C#/.Net recruiter in my area.
But my point is that some people have absoultely atrocious interview manners. Interviews are a two-way street and I discovered that there was absoultely no way I'd want to work with them. (Even when I just thought they were a team lead rather than the CEO it was enough to put me off.)
If he remembers that max signed int is ~2 billion, than easier to divide 4 billion by 2. 2b/1b/500m/250m/127m/64m - got 6 divisions, 32-6=26.
If you think that max int is irrelevant to the position - it is so relevant, I can't even describe, this number is everywhere, from database design to js-wasm (limited by 32-bit), from deep-learning (where some libraries still limited to 32-bit buffers) to networking (hello ipv4)
Figure that problem out first (something novel and useful), then start marketing yourself.
Right now you just gave us a story we've all lived (academic hazing) without any plan of action -- so 2010.
Sorry this page does not exist =(
Alternative:https://cc.bingj.com/cache.aspx?d=4652446581392&w=-V-8V9bl07...
Kagi is great but more options would be good too.
OP's product is clearly at a very early stage. OP's post is also pretty opinionated.
Hard to say which impact on product it will have - but as long we have more options for search engines, this will be one out of many options.
if you need a job or financial aid kindly contact us now via email : shalomagency247@outlook.com
Thanks.
It's also easy to read this as "interviewer hand-held a candidate through a problem".
For instance, if you search for 'Trump', the top links are
```
1. http://www.trump.de — found via Mwmbl -- Trump
2. https://itep.org/md/ — found via Mwmbl -- Trump Tax Proposals Would Provide Richest One Percent in Maryland with 69.7 Percent of the State’s Tax Cuts Earlier this year, the Trump administration r…
3. https://is.gd/mUHYTg — found via Mwmbl --- Trump embraces QAnon conspiracy because ‘they like me’ After skirting the issue for weeks, President Donald Trump offered an embrace Wednesday of the fri…
4. http://dict.cn/trump — found via Mwmbl -- trump是什么意思_trump在线翻译_英语_读音_用法_例句_海词词典
```
Surely there are millions of results more relevant to the phrase 'Trump' than trump.de. The other links aren't better. A random article from 2017? Another one from 2020. A Chinese dictionary definition of 'Trump'?
I get that search is hard, but what's going on here? You can try any phrase, and you just get weird results.
Naive / biased statements such of these cause me to lend less credence to author's other points.
Yeah, but you still reserve the right to not crawl sites (or to remove them from your index), yes? So there's still the opportunity to do evil.
I'm still waiting for a "raw" search spidering provider. One that:
1. runs a web-spidering cluster — one that's only smart enough to know what robots.txt is, to know how to follow links in HTML pages, and to obey response caching-policy headers;
2. captures the spidering process losslessly, as e.g. HAR transcript files;
3. packs those HAR transcript files, a few million at a time, into tar.xz.tar files (i.e. grab a "chunk" of N HAR files; group them into subdirs by request Host header; archive each subdir, and compress those archives independently; then archive all the compressed archives without compression) — and then uploads these semi-random-access archives to a CDN or private BitTorrent tracker (or any other data delivery system that enables clients to only retrieve the blocks/byte-ranges of files they're interested in);
4. generate a TOC for the semi-random-access files, as a stream of tuples (signed archive URL, chunk byte-range, hostname, compressed URL-list); push these to a managed reliable message queue on an IaaS, publishing each entry to both an all-hostnames topic, and a per-hostname topic. (I say an IaaS, as this allows consumers to set up their own consumer-groups on these topics within their own IaaS project, and then pay the costs of message retention in these consumer-groups themselves.)
5. Also buffer these TOC-entry streams into files (e.g. Parquet files), one archive series per topic; and host these alongside the HAR archives. Prune TOC topic stream entries if (entries are at least N days old AND the entries have been successfully "offlined" into a hosted TOC-stream archive.)
---
This "web-spidering-firehose data-lake as-a-Service" architecture, would enable pretty much anyone to build whatever arbitrary search index they want downstream of it, containing as much or as little of the web as they want — where each consumer only needs to do as much work as is required to fetch and parse the HARs of the domains they've decided they care about indexing something under.
This architecture would also be "temporal" (akin to a temporal RDBMS table) — as a consumer of this service, you wouldn't see "the current version" of a scraped URL, but rather all previous attempts to scrape that URL, and what happened each time. (This would mean that no website could ever censor the dataset retroactively by adding a robots.txt "Disallow *" after scrapes have already happened. Their robots.txt config would prevent further scraping, but previous scraping would be retained.)
And in fact, in this architecture, the HTTP interaction to retrieve /robots.txt for a domain, would produce a HAR transcript that would get archived like any other. Domains restricted from crawling by robots.txt, would still get regular HAR transcripts recorded of the result of checking that their /robots.txt still restricts crawling. (Reducing over these /robots.txt HAR transcripts is how a consumer-indexer would determine whether they should currently be showing/hiding a domain in their built index.)
Related
Surfing the (Human-Made) Internet
The internet's evolution prompts a return to its human side, advocating for personal sites, niche content, and self-hosted platforms. Strategies include exploring blogrolls, creating link directories, and using alternative search engines. Embrace decentralized social media and RSS feeds for enriched online experiences.
Perplexity's Grand Theft AI
Perplexity, a search engine rivaling Google, faces criticism for being a middleman that undermines original sources' revenue by summarizing content unethically. The CEO's deceptive practices raise concerns about trust and integrity.
Perplexity's Grand Theft AI
Perplexity, a search engine rivaling Google, faces criticism for bypassing original sources, dodging paywalls, and promoting unethical behavior. The CEO's defense raises concerns about trust and integrity online.
Waves of Writing for Google
The article explores the evolution of writing for Google, highlighting shifts from keyword stuffing to user-focused content and AI's impact on writing jobs. Writers are advised to adapt, focus on personal branding, and embrace technology for relevance.
Google has been lying about their search results [video]
A leak from Google's GitHub shows the search algorithm tracks user clicks and time on pages, raising concerns about search result accuracy, treatment of smaller websites, and SEO strategies.