Launch HN: Skyvern (YC S23) – open-source AI agent for browser automations
Skyvern is an open-source tool that automates browser workflows using large language models, reducing manual labor. It supports various applications, features real-time tracking, and offers $5 credits for new users.
Skyvern, developed by Suchintan and Shu, is an open-source tool designed to automate browser-based workflows using large language models (LLMs). The tool aims to alleviate the challenges companies face when building browser automations, which often require extensive manual labor or coding expertise. Skyvern allows users to create goal-based prompts for agents to perform complex tasks across various websites. The platform has been utilized for diverse applications, including generating insurance quotes, job applications, and automating government permit filings. Key features of Skyvern include a real-time action display, livestreaming browser instances, integration with authentication services, and the ability to chain multiple workflows. Additionally, it can process HTML elements and remember previous interactions for future use. The cost of using Skyvern has significantly decreased, with token costs dropping by 80%. New users are offered $5 in credits to explore the tool's capabilities. The developers encourage feedback from users to enhance the platform further.
- Skyvern automates browser workflows using LLMs, reducing the need for manual labor or coding.
- The tool supports various applications, including insurance quotes and job applications.
- Key features include real-time action tracking, livestreaming, and workflow chaining.
- Token costs have decreased by 80%, making the tool more accessible.
- New users receive $5 in credits to test the platform.
Related
Show HN: Jobber: OSS browser controlling agent to apply for jobs autonomously
Jobber is an AI agent that automates job applications by managing user resumes and preferences. It requires Python 3.8+, Poetry, and Chrome with remote debugging, built on the Sentient framework.
Launch HN: Arva AI (YC S24) – AI agents for instant global KYB onboarding
Rhim and Oli are developing Arva AI, an automated solution to streamline KYB compliance for banks and fintechs, reducing onboarding times and costs while improving efficiency and customer experience.
We accidentally burned through 200GB of proxy bandwidth in 6 hours
Skyvern's AI agent consumed 200GB of proxy bandwidth in six hours, costing $500, due to repeated downloads of a Google machine learning model. Solutions include local caching and URL blocking.
Show HN: Rocky AI – Chat with any webpage in Chrome using AI
Rocky AI is a Chrome extension that enhances web browsing by enabling AI interactions, offering features like article summarization, information lookup, and LinkedIn outreach while prioritizing user privacy and data security.
Show HN: Llama Workspace – An Open Source ChatGPT Teams Alternative
Llama Workspace is an open-source alternative to ChatGPT Teams, reducing costs by up to 82%. It supports multiple AI models, allows custom applications, and enables easy integration and self-hosting.
- Users express enthusiasm for the open-source nature and potential applications of Skyvern in automating workflows.
- Concerns are raised about security, particularly regarding the handling of sensitive data like login credentials and credit card information.
- Some commenters question the long-term viability of using third-party LLMs and the potential for misuse in automating interactions with websites.
- There is a discussion about the effectiveness of Skyvern compared to existing automation tools like Playwright, particularly in terms of precision and reliability.
- Several users share specific use cases they envision for Skyvern, indicating a strong interest in its practical applications.
I think this use case of automation in a BPA sense is more compelling than using it for test automation, because the latter is much more concerned with the precision and repeatability of the process. For the BPA task, arguably you care only about the outcome and it often doesn't matter if it gets there via some crazy route.
Part of the problem for me is that your example video shows a big wodge of prompt that had to be written to make this work and then a few kb of payload data (parameters) in a plaintext, non-csv format. If the expectation is that this replaces someone just using Playwright with codegen due to that being too technical, I'm not convinced there is a huge group of people who can manage one task but not the other.
Furthermore, you are expecting them to pass over their website login credentials and apparently their credit card details too, in plain text. You had better have a very solid idea of how to handle that sensitive data to avoid serious consequences if your users' skyvern accounts are compromised.
I think the frequency of website redesigns is oversold by people producing these LLM-driven Playwright wrappers, especially when targeting old-fashioned or government sites. As an example, we have had a suite of lengthy Playwright browser automations to interact with a government site for a few years and have had to maintain them only once, when the agency's business process changed. The prompt would also have needed to change had we used Skyvern, as would the payload, because the process was different. The difference with the Playwright automation, though, is that we could use assertions to verify steps had succeeded/failed and data had been recorded correctly, so we would know the process needed updating. I can't see that option in Skyvern which would have me worrying that process changes would be overlooked and we would unknowingly start entering the wrong data or missing steps.
Anthropic threw their hat in this ring yesterday, and it will very likely be followed by OpenAI and Google soon. Godspeed.
1. Is this working around friction due to a lack of interoperability between tools? For example, is this something that would be more efficient if the owner of the website exposed a REST service? Will the existence of this tool disincentivize companies from exposing services when it makes sense?
2. If there is a good reason for the lack of a service endpoint, perhaps for security reasons, will your automation workflow be used to bypass those security measures? Could your tool be used by malicious actors to disable major services? Are you that malicious actor yourself? Will your tool be used by scalpers to prevent consumers from buying high-demand products?
3. If this is being used to work around deferred maintenance with internal tools and processes, will the existence of these kind of tools be used by management to justify further deferral of that maintenance? Will your tool become a critical piece of the support staff's workflow?
4. If your tool is being used in good faith to work around anti-patterns in website design, will the owner of the website be incentivized to break your workflow? Is your use case just a step in an arms race?
These are the thoughts that go through my head whenever I hear about software being laid on top of complicated processes, where instead of simplifying the underlying processes, we add another layer of complexity to sweep it under the rug. I'm sure that people will find your project useful, but I wonder what the longer-term effects will be.
I've limited my problem scope to single page interactions / scraping which has been very reliable and useful for my company. But agentic automation does sound fun.
Like this: Could I use this to pull screenshots or PDFs of my grocery receipts from a major grocery chain?
Any plans on bundling a local LLM / supporting local LLMs?
I want to use this to automate approving/declining group members for our facebook group which is approaching half million members and fb admin tools are pretty lacking
What's the use case here exactly? Sorry for being a bit pessimistic, but this sounds like an easy way to automatically send a lot of spam.
There are many back office tasks where people copy data from page 1 into a form of page 2.
Unfortunately the mobile experience is pretty bad - practically unusable. I'd expect any web application made in the last decade to be mobile-first.
Question, if it's computer vision based, does that mean that it can be trivially ported to support desktop automations?
I'm going to be playing with this.
Related
Show HN: Jobber: OSS browser controlling agent to apply for jobs autonomously
Jobber is an AI agent that automates job applications by managing user resumes and preferences. It requires Python 3.8+, Poetry, and Chrome with remote debugging, built on the Sentient framework.
Launch HN: Arva AI (YC S24) – AI agents for instant global KYB onboarding
Rhim and Oli are developing Arva AI, an automated solution to streamline KYB compliance for banks and fintechs, reducing onboarding times and costs while improving efficiency and customer experience.
We accidentally burned through 200GB of proxy bandwidth in 6 hours
Skyvern's AI agent consumed 200GB of proxy bandwidth in six hours, costing $500, due to repeated downloads of a Google machine learning model. Solutions include local caching and URL blocking.
Show HN: Rocky AI – Chat with any webpage in Chrome using AI
Rocky AI is a Chrome extension that enhances web browsing by enabling AI interactions, offering features like article summarization, information lookup, and LinkedIn outreach while prioritizing user privacy and data security.
Show HN: Llama Workspace – An Open Source ChatGPT Teams Alternative
Llama Workspace is an open-source alternative to ChatGPT Teams, reducing costs by up to 82%. It supports multiple AI models, allows custom applications, and enables easy integration and self-hosting.