Does anyone integrate with their customers' DB directly?
The author discusses integrating third-party SaaS vendors with customer databases, reflecting on their fintech experience, emphasizing information security, and exploring a tool to standardize this integration while seeking feedback.
The discussion revolves around the integration of third-party SaaS vendors with customers' data stores, such as Supabase and Snowflake. The author shares their experience from a previous startup that developed a chargeback management tool for fintech companies. In that role, they had to query the fintech's database for transaction details using only the transaction ID, as the fintech could not allocate engineering resources to send data to an API endpoint. The author reflects on the practice of directly querying a customer's database, emphasizing the importance of information security. They express curiosity about how common this practice is and suggest that integrating with internal databases could expedite the onboarding process for startups. The author is considering developing a tool to standardize this integration process and seeks insights from others who have experience with direct database integration, asking for use cases and experiences.
- Integrating with customers' databases can streamline onboarding for startups.
- The author reflects on their experience with a chargeback management tool for fintechs.
- Information security is a critical consideration when querying customer databases.
- The author is exploring the idea of creating a tool to facilitate these integrations.
- They seek feedback and stories from others who have integrated directly with customer databases.
Related
Supabase (YC S20) Is Hiring an Auth Engineer (Remote)
Supabase, an open-source Firebase alternative, seeks an Auth Engineer for remote work. Role involves Auth feature development using Go, TypeScript, and Postgres. Benefits include equity ownership and tech allowance. Application process is fully remote.
Ask HN: I built a Yubikey-based domain controller. Is it sellable?
The individual discusses their R&D experience, highlighting a security appliance for remote access, targeting SMBs with varying IT budgets, and seeking marketing strategies amid challenges with legacy systems and client support.
Startup Finance for Founders — Part I, Accounting
Peter Reinhardt, CEO of Charm Industrial, discusses startup finance, emphasizing accounting basics, banking structures, credit management, and contract formalities, while sharing his experiences from Segment's growth and funding journey.
Where Banking APIs Fall Short
Software engineers face challenges integrating with bank APIs due to reliance on outdated file-based systems. The focus should shift to enhancing existing standards like ISO20022 rather than creating new APIs.
Database in the Browser, a Spec
The article proposes a "database in the browser" to simplify web application development, addressing challenges like data consistency, state management, and the need for robust permission systems and complex features.
This is an unfortunate attitude.
From a software architecture point of view, you're hard-coupling your software to that database which can create brittleness: You're at the mercy of the original database schema which may not be optimal, carries it's own tech debt, may require translations to your own implementation, and may cause mistakes in interpretation in anything of scale. Even worse, you're at the customer's mercy if they want to change anything which immediately breaks your code (likely without warning) and now you have to fix it (likely under duress).
Letting an outside org get direct access to a database is an unnecessary security risk, even if you are "treating infosec seriously". I'd argue no organization that treats infosec seriously would want to request access to or grant access to anything more than the minimal information they need. https://en.wikipedia.org/wiki/Principle_of_least_privilege
Modern software design has found putting up abstractions and interfaces to address those issues which is why it's not particularly common. In fact, one fintech I worked with did this and ruined a team's holiday weekend by making an unfortunate and unsignaled change late on a Friday.
You might go live quicker. But the integration will break every time your customer makes a change or needs to upgrade. Pray there's documentation. You'll spend unending hours debugging your customers' weird data models. Issues will appear when logic changes but the data model doesn't.
I won't tell you not to do this because you've been so light on details of what you're actually doing (maybe it is easier this way!), but it would not fly on my watch.
I'd hope to never work at a company like that again as SRE, my life was nightmare.
This avoids a situation where we get an email out of the blue explaining that there is a database change and we need to dedicate engineering resources to make sure it is compatible by a certain date, or even worse and more likely, an urgent email explaining that the integration has broke and we need it back running last week.
-> Permission control, making sure that the user of API can not see data they are not supposed to.
-> Auditability. Verify that the API is being used correctly.
-> Performance. Do not overload the endpoint. (Read from a read replica? And maybe you are not running hour-long analytics queries on the database)
-> Consistency. Are you using transactions to read your data? Could you be causing contention?
-> API versioning. How do you upgrade the underlying tables without breaking the users.
For some of our outgoing files at $employer, there's a notice period of... I think I remember it being 30 days for additional code values and 90 days for layout changes. That sort of planning ahead becomes much harder if every single schema change is immediately visible to outsiders.
I've worked on systems that use a SQL DB as a communication layer. A main advantage is that every language comes with good SQL read/write libraries, so things just work right away.
In terms of Pros: - It was quick to set up and _seems_ to work, mostly
For Cons: - We have no guarantees that this query will continue to work - We have no understanding of the model behind this data (why are these records in this state, and these in another? No idea, and the vendor doesn't seem to understand the question) - We need to poll this data source. How often? No idea!
Culturally I think it suggests an immature tech organisation in general, so while not a red flag I'd suggest it is an orange flag.
Option 1 seems like a lot of work for the customer ... probably not viable.
Option 2: you can securely connect to their data-source using oAuth2.0. In fact I know MongoDB offers this (https://www.mongodb.com/docs/atlas/security-oidc/) for this exact use-case I believe. I recall using Vercel to integrate with my MongoDB.
MongoDB: https://www.mongodb.com/docs/atlas/security-oidc/ Supabase: https://supabase.com/docs/guides/auth
Though I haven't done it directly, I think integrating with the DB through a secure protocol like oAuth is the way to go. DB services offer this and must be for this exact use case.
Unfortunately, the longer into the project you are, the more brittle that connection becomes. You’re tightly coupling a process to the state of the database at time of build. When you do that, you create a situation where your customers can kill your code.
So fundamentally, when you integrate directly, you create a situation where your customers’ engineers control how reliable your software can be. They likely won’t fess up when they break your tool so the problems will come back to you.
I know that sounds a lot like typing ‘npm install’ or ‘building an API’ but in this case, you’re handing the keys over to your customers. At best, it will cause reliability problems. At worst, it will add friction to a sales process. As companies scale and sales gains power, that can become a career issue for you. As a rule, it’s better to have difficult technical decisions when you’re employed than difficult financial decisions when you’re not.
So good writing and excellent analysis. But if you choose to go down that path, at some point in the future, I feel like that integration will become someone’s headache.
If you have to do it, it would be worth documenting that the customer in this case won’t provide proper access through an API. That’s a good reason to do something like this because that’s quite unreasonable of the customer. If they’re unwilling to provide API access, I feel like they’re just as unlikely to provide a read replica so aside from the integration’s brittleness, your queries have to be rock solid or you could slow down prod.
It’s all risky and you have a big decision to make. Good luck and have fun.
It works surprisingly well, and is pretty resilient since it essentially acts as a message queue too. Then again, this is all low traffic stuff in a SME / B2B setting, with zero multitenancy involved.
I’ll still take a proper API or message queue any day though.
Edit: I suppose our biggest benefit is that our customer can actually change the interface with us fairly quickly. Then have database experts in-house, but devs who could do APIs. So the collaboration has been mostly smooth an account of that, and that is a huge advantage compared to them having to outsource any API work. Technically speaking I’m not a fan, but the non-technical results have been useful.
If you're doing writes, this is potentially harmful to the customer, too, not just data exfiltration but also potentially software breaking and that could be lawsuit territory.
Even doing this with your own services' data stores is bad practice. Direct reads/writes to a service's data store without going through its defined interfaces means unexpected, often unmonitored changes happening. I strongly advise against this pattern of "secret APIs," as opposed to overt ones.
Also, please note that this applies to any data store and not just a DB.
- Customer buys a service from a vendor
- Customer creates a Vnet peering in Azure between their subscriptions
- Customer queries the vendor's DB's directly
I will not name names for the vendors or customers, but you have heard of all of them.
The OP already said that their customer didn't have the resource to even connect to an API, so what on earth makes you think they're gonna be able to create some highly-abstrated API layer or some other "bEsT PrAcTiCe" way of exposing the data?
And even if they did create a view or API for the database instead of giving direct access, what makes you think they'd keep that view or API up-to-date? It's just as likely to break as a changing database schema is.
There's an extremely real chance that this fintech giving you direct DB access was breaking the law by doing so.
Of course, no one will dedicate time or money to that API until something breaks. Then you pull out all the emails, risk/assumption statements etc to highlight your due-diligence on informing the client. Then fix the problem, pushing the whole issue down the road again until the next time.
But seriously, expect to see this anywhere there are budget and time constraints. As soon as the fintech is big enough to get more worried about risk than growth, they should start taking this kind of thing seriously.
But in reality, any big corp assumes if you have the access, you’ve used it. So it should be pretty common
At Scramjet we've built an Platform+Agent engine and put it to use and in many cases we do integrate into the DB's directly without APIs through long-lived integration functions.
Let me address some of the comments below:
- hard coupling of DB is bad - I really don't see how adding an additional middleware results in lessening this, you're hard coupling into something else. While it might help if a database structure is changed on a whim but, from my experience, most of the schema changes reflect changed requirements.
- hard couping is bad in general - is it though? As a platform engineer I went into a pitfall of making everything reusable only to learn that 90% of the stack has never changed and the XML-based protocol projects developed in 2004 are still being developed without change to the protocol at all. A reactive approach with a separated client/data model is much better in my opinion and quite frankly it's mostly impossible to write a system these days without one.
- schema changes are unavoidable - but hey, as someone pointed out, there are views and materialised views. The RDBs do offer a good framework for keeping legacy schemas during the development cycle and this framework will save you the most money as it affects the least amount of moving parts in your system. It's there since the 90s at least, so why not use it?
- the security mumbo jumbo - this seems to be the most uninformed opinion - all leading RDBMS have an extremely robust and granual way to grant access to the data. They are battle tested for nearly half of a century. They are well known and easily tested. And, unlike the REST frameworks, HTTP servers, relays, caches and what not, those are contained in a single piece of software, limitting the attack vectors.
If you can deploy the client to the DB remotely, near the database then I'd say - you're doing this right and not only that, you're limitting the amount of code you need to deploy, maintain and depend on. If you separate the data, the access code and the logic from each other, you have achieved decoupling. But that does not mean that REST API's are the "one correct way".
Such an approach was the underlying idea of our Scramjet Platform and we've taken this approach not only in integrating into DBs, but also file storage, direct protocols and even GPIOs in some cases. And though we have a trick up our sleve of being able to split the code across distributed environments, even in general terms I'd say: it works, it's maintainable and way less expensive.
The technical debt you'd accrue from this would be MASSIVE, and screw your entire architecture down to its roots. Hard pass from me.
Related
Supabase (YC S20) Is Hiring an Auth Engineer (Remote)
Supabase, an open-source Firebase alternative, seeks an Auth Engineer for remote work. Role involves Auth feature development using Go, TypeScript, and Postgres. Benefits include equity ownership and tech allowance. Application process is fully remote.
Ask HN: I built a Yubikey-based domain controller. Is it sellable?
The individual discusses their R&D experience, highlighting a security appliance for remote access, targeting SMBs with varying IT budgets, and seeking marketing strategies amid challenges with legacy systems and client support.
Startup Finance for Founders — Part I, Accounting
Peter Reinhardt, CEO of Charm Industrial, discusses startup finance, emphasizing accounting basics, banking structures, credit management, and contract formalities, while sharing his experiences from Segment's growth and funding journey.
Where Banking APIs Fall Short
Software engineers face challenges integrating with bank APIs due to reliance on outdated file-based systems. The focus should shift to enhancing existing standards like ISO20022 rather than creating new APIs.
Database in the Browser, a Spec
The article proposes a "database in the browser" to simplify web application development, addressing challenges like data consistency, state management, and the need for robust permission systems and complex features.