Surfer: Centralize all your personal data from online platforms
Surfer centralizes personal data from various online platforms by scraping and exporting it to local storage. It is available for download, with community support through Discord and a roadmap for future enhancements.
Read original articleSurfer is a project designed to centralize personal data from various online platforms into a single folder, addressing the challenge of scattered data. The application functions by navigating to websites, checking user sign-in status, and scraping data for export to local storage. Users initiate the process by clicking an "Export" button, after which the app waits for the target page to load, verifies if the user is signed in, and then proceeds to scrape and export the data. Surfer can be downloaded from its official website or GitHub releases page, with guidelines available for local setup and contributions. The project has a roadmap that includes short-term goals like obtaining a code signing certificate and expanding platform support, as well as long-term objectives such as implementing concurrent scraping and integrating with advanced AI frameworks. Surfer is distributed under the MIT License, and community support is available through a Discord server. A demo video showcasing the application is also accessible on YouTube.
- Surfer centralizes personal data from multiple online platforms.
- The application scrapes user data and exports it to local storage.
- It is available for download from its website and GitHub.
- The project aims to expand platform support and enhance functionality.
- Community interaction is facilitated through a Discord server.
Related
Show HN: SaaS Surf – Curated tools for makers that are off the hook
SaaS Surf offers curated tools, resources, and lifetime deals for developers, designers, and entrepreneurs. It features products like Snitcher and Sitechecker for developers, Pixelfree Studio for designers, and discounted lifetime deals. The platform aims to be a comprehensive SaaS solution.
Show HN: G-Scraper, a GUI Web Scraper, Written in Python
The G-Scraper project is a Python GUI web scraper with features like request support, scraping multiple URLs/elements, logins, and data saving. Find details on GitHub for usage and contribution.
- Several commenters point out that Surfer is not the first tool of its kind, with existing alternatives like DogSheep already available.
- There are concerns about the centralization of personal data, with some viewing it as a potential privacy risk.
- Users express a desire for more extensive platform support and customization options, such as CLI tools or integrations with existing systems.
- Feedback from a contributor indicates ongoing development and openness to community input.
- Some users highlight the technical challenges of maintaining scrapers due to frequent changes in platform APIs.
I would prefer a cli tool with partial gather support. Something that I could easily setup to run on a cheap instance somewhere and have it scrape all my data continuously at set intervals, and then give me the data in the most readable format possible through an easy access path. I've been thinking of making something like that, but with https://github.com/microsoft/graphrag at the center of it. A continuously rebuilt GraphRAG of all your data.
It is based around SQLite rather than Supabase (Postgres) which I think is a better choice for preservation/archival purposes.
1. Much tougher data privacy regulations (needed per country)
2. A central trusted, international nonprofit clearinghouse and privacy grants/permissions repository that centralizes basic personal details and provides a central way to update name, address(es), email, etc. that are then used on-demand only by companies (no storage)
By doing these, it simplifies things greatly for people and allows someone to audit and see what every company knows about them, can know about, and can remove allowances for companies they don't agree to. One of the worst cases is the US where personal information is not owned by the individual and there is almost zero control unless it's health related, and can be traded for profit.
This lets me create dashboards to see usage for certain topics. For example, I have a "Dev Browser" which tracks the latest sites I've visited that are related to development topics [1]. I similarly have a few for all the online reading I do. One for blogs, one for fanfiction, and one for webfiction in general.
I've talked about my first iteration before on here [2].
My second iteration ended up with a userscript which sends the data on the sites I visit to a Vector instance (no affiliation; [3]). Vector is in there because for certain sites (ie. those behind draconian Cloudflare configuration), I want to save a local copy of the site. So Vector can pop that field save it to a local minio instance and at the same time push the rest of the record to something like Grafana Loki and Postgres while being very fast.
I've started looking into a third iteration utilizing MITMproxy. It helps a lot with saving local copies since it's happening outside of the browser, so I don't feel the hitch when a page is inordinately heavy for whatever reason. It also is very nice that it'd work with all browsers just by setting a proxy which means I could set it up for my phone both as a normal proxy or as a wireguard "transparent" proxy. Only need to set up certificates for it work.
---
[1] https://raw.githubusercontent.com/zamu-flowerpot/zamu-flower... [2] https://news.ycombinator.com/item?id=31429221 [3] http://vector.dev
1. Use Mobile App APIs.
2. Generate OpenAPI Arrazo Workflows.
1 ensures breakage is minimal, since mobile apps are slow upgrades and older versions are expected to keep working. 2 lets you write repeatable recipes using YAML, and that makes it quite portable to other systems.
The Arazzo spec is still quite early though, but I am hopeful of this approach.
Related
Show HN: SaaS Surf – Curated tools for makers that are off the hook
SaaS Surf offers curated tools, resources, and lifetime deals for developers, designers, and entrepreneurs. It features products like Snitcher and Sitechecker for developers, Pixelfree Studio for designers, and discounted lifetime deals. The platform aims to be a comprehensive SaaS solution.
Show HN: G-Scraper, a GUI Web Scraper, Written in Python
The G-Scraper project is a Python GUI web scraper with features like request support, scraping multiple URLs/elements, logins, and data saving. Find details on GitHub for usage and contribution.