Deep dive into finding RSS feeds
Lighthouse Feed Finder helps users locate hidden RSS feeds using metadata detection, URL guessing, and third-party services, while encouraging feedback to enhance its functionality and managing access through caching.
Read original articleLighthouse has developed a tool called the Lighthouse Feed Finder to assist users in locating RSS feeds for various websites. The tool employs multiple methods to identify RSS feeds, acknowledging that while many websites do not openly advertise their feeds, they often include metadata that can be detected. The RSS icon is a common way to link to feeds, but many sites do not utilize it. The Feed Finder uses RSS autodiscovery, which relies on specific metadata tags in the HTML of a website, to locate feeds automatically. However, if this method fails, the tool employs advanced techniques such as checking parent URLs, blog sections, sitemaps, and visible links on the site. It also attempts to guess common feed URLs based on standard suffixes. In cases where a website does not provide its own RSS feed, the tool can check third-party services that generate feeds. The Feed Finder aims to improve its capabilities over time and encourages user feedback to enhance its functionality.
- Lighthouse Feed Finder helps locate RSS feeds for websites that may not openly display them.
- The tool uses various methods, including metadata detection and URL guessing, to find feeds.
- It prioritizes finding feeds published directly by websites before checking third-party services.
- User feedback is encouraged to improve the tool's effectiveness in finding RSS feeds.
- The tool is designed to avoid overwhelming websites with requests, implementing caching to manage access.
Related
13ft – A site similar to 12ft.io but is self hosted
The 13 Feet Ladder project is a self-hosted server that bypasses paywalls and ads, allowing access to restricted content from sites like Medium and The New York Times.
Full Text, Full Archive RSS Feeds for Any Blog
The blog post highlights limitations of RSS and ATOM feeds in cyber threat intelligence, introducing history4feed software to create historical archives and retrieve full articles for comprehensive data access.
Show HN: Free tool to find RSS feeds, even if not linked on the page
A new tool at lighthouseapp.io helps users find RSS feeds, even if unlinked, by checking meta tags, common suffixes, sitemaps, and third-party feeds, with future enhancements planned.
Hacker News RSS
hnrss.org offers real-time RSS feeds for Hacker News, allowing users to filter content by parameters like points and comments, and subscribe to specific users or keywords in various formats.
What RSS reader do you use?
Users discussed alternatives to Tiny Tiny RSS, favoring NetNewsWire and Reeder for Apple, while Miniflux and FreshRSS were preferred for self-hosting. The conversation highlighted diverse RSS reader needs and preferences.
- Users express nostalgia for the past when RSS feeds were more prominently linked on websites.
- Several commenters share methods for discovering hidden RSS feeds, including using scripts and tools.
- There are suggestions for improving RSS feed discovery tools, such as integrating with existing services or adding more common URL suffixes.
- Some users report bugs and issues with current feed finder tools, indicating a need for further development.
- Discussion includes the potential for RSS to integrate with other content types, enhancing user experience.
* Yes, I know the article talks about the RSS icon, i'm just soapboxing.
I follow mostly RSS on non technology website, for instance road cycling. people that wouldn't care or know about RSS because they are not very techy, yet because they are normies that use WordPress for all their website it puts a page with RSS feed automatically. You got to find it with developer tool by searching RSS but 99% of the time if it's WordPress it got RSS.
Thank you WordPress you bloated piece of shit :)
I've been adding to my feeds.opml since reddit started dying in ~2015 and now I'm up to around ~1700 feeds and mostly independent from aggregators; though I still collect new feeds from HN/IRC/etc. Mostly I just always make a point to look for them whenever I read something cool on the web.
This causes the following error: TypeError: URL constructor: //matthew.science/posts/riscv/ is not a valid URL.
Note: Hoarder can automatically hoard RSS feeds as part of its 'bookmark everything' functionality. Hoarder uses AI to tag all the content (URLs, feeds, images, notes) so you can then do full text searches on your personal archive of your bookmarks etc.
The browser as we now know it is mostly a static application that has long lost its user-centric mission. Websites might push some stuff but the user must do thinks manually. Its primary function is to provide a search window to external search. People even stopped using bookmarks and search for everything.
This hypothetical RSS-Browser could become the main organizational tool for the users web experience, integrating the use of bookmarks.
In fact even more "feeds" could be integrated like email and activitypub or atproto posts. It boils down to the fact that each person has a number of profiles/roles and within each they have a taxonomy of interests and we need a tool that integrates static and dynamic sources of information.
Turns out the feed finder couldn't find the feeds even though I've linked to them using clickable RSS icons.
I didn't know about the autodiscovery feature so I'll add that now.
https://github.com/begriffs/findrss
The combinations came from what I observed in the big list of blogs I follow. The script works pretty well for most sites.
The problem with the approach presented here is speed. Most of the web pages, especially smaller are really slow.
Crawling most of the web pages is pain, especially if you use selenium and small SBC.
Therefore either the page presents a clean nice RSS link, or get lost.
Most of the good, modern pages give you nice RSS. Even GitHub gives you RSS for commits.
For other pages I try openRSS.
For YouTube I use yt-dlp to obtain channel id, to establish RSS.
Algorithm is crude, but gets the job done.
https://github.com/rumca-js/Django-link-archive/blob/main/rs...
Or I suppose you could just find all "Content-type: application/rss+xml" in CC.
I know in the past, when I was looking for large lists of RSS feeds, I didn't really find what I was looking for.
Related
13ft – A site similar to 12ft.io but is self hosted
The 13 Feet Ladder project is a self-hosted server that bypasses paywalls and ads, allowing access to restricted content from sites like Medium and The New York Times.
Full Text, Full Archive RSS Feeds for Any Blog
The blog post highlights limitations of RSS and ATOM feeds in cyber threat intelligence, introducing history4feed software to create historical archives and retrieve full articles for comprehensive data access.
Show HN: Free tool to find RSS feeds, even if not linked on the page
A new tool at lighthouseapp.io helps users find RSS feeds, even if unlinked, by checking meta tags, common suffixes, sitemaps, and third-party feeds, with future enhancements planned.
Hacker News RSS
hnrss.org offers real-time RSS feeds for Hacker News, allowing users to filter content by parameters like points and comments, and subscribe to specific users or keywords in various formats.
What RSS reader do you use?
Users discussed alternatives to Tiny Tiny RSS, favoring NetNewsWire and Reeder for Apple, while Miniflux and FreshRSS were preferred for self-hosting. The conversation highlighted diverse RSS reader needs and preferences.