Show HN: Free tool to find RSS feeds, even if not linked on the page
A new tool at lighthouseapp.io helps users find RSS feeds, even if unlinked, by checking meta tags, common suffixes, sitemaps, and third-party feeds, with future enhancements planned.
A new tool has been developed to help users find RSS feeds for websites, available at lighthouseapp.io. The tool aims to identify feeds even if they are not directly linked on the site, addressing the limitations of traditional methods that rely on checking meta tags. While it currently succeeds in over 90% of cases using standard techniques, the tool's goal is to ensure that if it does not find a feed, then none exists. It performs several functions to achieve this, including checking the meta tags of parent pages, looking for common feed suffixes like /rss and /index.xml, examining the sitemap, analyzing all links on the page, and checking third-party feeds such as OpenRSS. Future enhancements may include searching through search engines and crawling entire domains, although the latter may be inefficient. Users are encouraged to test the tool and report any sites where it fails to find feeds.
- A new tool for finding RSS feeds is available at lighthouseapp.io.
- The tool aims to identify feeds even if they are not linked on the website.
- It checks meta tags, common feed suffixes, sitemaps, and third-party feeds.
- Future improvements may include search engine checks and domain crawling.
- Users are invited to provide feedback on sites where the tool does not work.
Related
How I scraped 6 years of Reddit posts in JSON
The article covers scraping 6 years of Reddit posts for self-promotion data, highlighting challenges like post limits and cutoffs. Pushshift is suggested for Reddit archives. Extracting URLs and checking website status are explained. Findings reveal 40% of sites inactive. Trends in online startups are discussed.
Two months of feed reader behavior analysis
An analysis of feed reader behavior revealed significant request handling patterns, with some applications like Netvibes and NextCloud-News facing caching issues, while others like Miniflux performed better.
13ft – A site similar to 12ft.io but is self hosted
The 13 Feet Ladder project is a self-hosted server that bypasses paywalls and ads, allowing access to restricted content from sites like Medium and The New York Times.
Full Text, Full Archive RSS Feeds for Any Blog
The blog post highlights limitations of RSS and ATOM feeds in cyber threat intelligence, introducing history4feed software to create historical archives and retrieve full articles for comprehensive data access.
Show HN: I'm making an AI scraper called FetchFox
FetchFox is an AI-powered Chrome extension that allows users to scrape data from websites by describing their needs in plain English, bypassing anti-scraping measures, and exporting results in CSV format.
If you maintain any website with a news feed, go right now and check that you have this in your <head>:
<link rel="alternate" type="application/rss+xml" href="/rss.xml" title="News feed" />
^^^^^^^^ change! ^^^^^^^^^
(Also note whether and where you need to use application/rss+xml, application/atom+xml, or application/json.)I suspect some sites are just running some framework than enables it and don't even realize they have one.
I have used this site in the past to find feeds: https://www.rsssearchhub.com/
In the past I was looking for a feed for https://ra.co, but could not find it, though I had seen old posts referencing a RSS feed.
I ended up emailing them and, to my delight, they let me know they still have an unsupported RSS feed here:
https://ra.co/xml/rss_news.xml
Just for feedback, this tool doesn't find the feed, though it doesn't look like a standard URL to me.
>Mozilla is working on alternatives such as Pocket or Reader Mode, and on improving WebExtensions which could provide features related to RSS/Atom feeds without the toll on maintenance. (ref: https://www.ghacks.net/2018/07/25/mozilla-plans-to-remove-rs...)
Ok then.
Also, this would make more sense as a browser extension. Especially if it brought back the RSS icon in the address bar to indicate when a feed is available (although maybe you don't want it to do all of the checks until prompted).
TypeError: URL constructor: is not a valid URL. [NextJS] (5603-cb6f1c5a9761f9d0.js:14:5466)
Browser is Firefox 130.0 on Windows.
Would be really nice to see this working really well since I search for RSS feeds a lot for a bunch of different things. Whether the RSS feed is good is always another question.
Your method described above should have found at least two feeds I think.
I always use RSSHub Radar , Your tools support more website than RSSHub Radar
Detection of /feed could be added, most wordpres supported sites have this suffix
Wondering if it's necessary to continue with the other checks if you find a feed in the meta tags?
Related
How I scraped 6 years of Reddit posts in JSON
The article covers scraping 6 years of Reddit posts for self-promotion data, highlighting challenges like post limits and cutoffs. Pushshift is suggested for Reddit archives. Extracting URLs and checking website status are explained. Findings reveal 40% of sites inactive. Trends in online startups are discussed.
Two months of feed reader behavior analysis
An analysis of feed reader behavior revealed significant request handling patterns, with some applications like Netvibes and NextCloud-News facing caching issues, while others like Miniflux performed better.
13ft – A site similar to 12ft.io but is self hosted
The 13 Feet Ladder project is a self-hosted server that bypasses paywalls and ads, allowing access to restricted content from sites like Medium and The New York Times.
Full Text, Full Archive RSS Feeds for Any Blog
The blog post highlights limitations of RSS and ATOM feeds in cyber threat intelligence, introducing history4feed software to create historical archives and retrieve full articles for comprehensive data access.
Show HN: I'm making an AI scraper called FetchFox
FetchFox is an AI-powered Chrome extension that allows users to scrape data from websites by describing their needs in plain English, bypassing anti-scraping measures, and exporting results in CSV format.