September 2nd, 2024

Does anyone integrate with their customers' DB directly?

The author discusses integrating third-party SaaS vendors with customer databases, reflecting on their fintech experience, emphasizing information security, and exploring a tool to standardize this integration while seeking feedback.

Does anyone integrate with their customers' DB directly?

The discussion revolves around the integration of third-party SaaS vendors with customers' data stores, such as Supabase and Snowflake. The author shares their experience from a previous startup that developed a chargeback management tool for fintech companies. In that role, they had to query the fintech's database for transaction details using only the transaction ID, as the fintech could not allocate engineering resources to send data to an API endpoint. The author reflects on the practice of directly querying a customer's database, emphasizing the importance of information security. They express curiosity about how common this practice is and suggest that integrating with internal databases could expedite the onboarding process for startups. The author is considering developing a tool to standardize this integration process and seeks insights from others who have experience with direct database integration, asking for use cases and experiences.

- Integrating with customers' databases can streamline onboarding for startups.

- The author reflects on their experience with a chargeback management tool for fintechs.

- Information security is a critical consideration when querying customer databases.

- The author is exploring the idea of creating a tool to facilitate these integrations.

- They seek feedback and stories from others who have integrated directly with customer databases.

Link Icon 27 comments
By @drpossum - 5 months
> there's nothing wrong with querying the fintech's DB as long as we're treating infosec seriously

This is an unfortunate attitude.

From a software architecture point of view, you're hard-coupling your software to that database which can create brittleness: You're at the mercy of the original database schema which may not be optimal, carries it's own tech debt, may require translations to your own implementation, and may cause mistakes in interpretation in anything of scale. Even worse, you're at the customer's mercy if they want to change anything which immediately breaks your code (likely without warning) and now you have to fix it (likely under duress).

Letting an outside org get direct access to a database is an unnecessary security risk, even if you are "treating infosec seriously". I'd argue no organization that treats infosec seriously would want to request access to or grant access to anything more than the minimal information they need. https://en.wikipedia.org/wiki/Principle_of_least_privilege

Modern software design has found putting up abstractions and interfaces to address those issues which is why it's not particularly common. In fact, one fintech I worked with did this and ruined a team's holiday weekend by making an unfortunate and unsignaled change late on a Friday.

By @bastawhiz - 5 months
The only reasonable way to do this safely is by querying a read replica. You could take down your customer's systems very, very easily by running queries without proper indexes. Dealing with that is probably trickier than it sounds, because a DB of appreciable size just won't be queryable.

You might go live quicker. But the integration will break every time your customer makes a change or needs to upgrade. Pray there's documentation. You'll spend unending hours debugging your customers' weird data models. Issues will appear when logic changes but the data model doesn't.

I won't tell you not to do this because you've been so light on details of what you're actually doing (maybe it is easier this way!), but it would not fly on my watch.

By @stackskipton - 5 months
I've been at companies that do this. It's extremely brittle process with significant overhead. Normally, we have to setup S2S VPN with all security/administrative implications of that. Then deal with any database schema changes and like.

I'd hope to never work at a company like that again as SRE, my life was nightmare.

By @wolpoli - 5 months
I would get the customer to create (and own) the API endpoint that talks to their database, and we talk to that API endpoint instead. Then it is up to them to test and maintain that endpoint.

This avoids a situation where we get an email out of the blue explaining that there is a database change and we need to dedicate engineering resources to make sure it is compatible by a certain date, or even worse and more likely, an urgent email explaining that the integration has broke and we need it back running last week.

By @Nican - 5 months
Yes. SQL is a form of API, and it carries all the same challenges.

-> Permission control, making sure that the user of API can not see data they are not supposed to.

-> Auditability. Verify that the API is being used correctly.

-> Performance. Do not overload the endpoint. (Read from a read replica? And maybe you are not running hour-long analytics queries on the database)

-> Consistency. Are you using transactions to read your data? Could you be causing contention?

-> API versioning. How do you upgrade the underlying tables without breaking the users.

By @testemailfordg2 - 5 months
Something boring but workable is using CSV files, dropped to a location on customers end through SFTP containing your transaction IDs that you are interested in. If its MS SQL Server on the other side, then a SQL agent on that server using some SSIS can read the CSV as input and prepare your required output as a CSV again for you to pickup from their SFTP Server.
By @tbrownaw - 5 months
Congratulations, schema changes are now directly visible outside your organization.

For some of our outgoing files at $employer, there's a notice period of... I think I remember it being 30 days for additional code values and 90 days for layout changes. That sort of planning ahead becomes much harder if every single schema change is immediately visible to outsiders.

By @BurningFrog - 5 months
Seems fine to me, provided some sanity constraints, like having read-only access, and some guarantee that the fields you're interested in will not be removed/changed.

I've worked on systems that use a SQL DB as a communication layer. A main advantage is that every language comes with good SQL read/write libraries, so things just work right away.

By @anitil - 5 months
I've had to do this due to tech not getting a say in vendor choice. We were given an SSH tunnel, a DB user and a query to run by the vendor and sent on our way. Of course, periodically that query stops working because 'woops we changed our schema', and we have to wait for a new query for us to run.

In terms of Pros: - It was quick to set up and _seems_ to work, mostly

For Cons: - We have no guarantees that this query will continue to work - We have no understanding of the model behind this data (why are these records in this state, and these in another? No idea, and the vendor doesn't seem to understand the question) - We need to poll this data source. How often? No idea!

Culturally I think it suggests an immature tech organisation in general, so while not a red flag I'd suggest it is an orange flag.

By @mr_toad - 5 months
Many DBAs don’t even like internal analysts directly querying production line of business databases. Usually they’d set up replicas, or even a full blown data warehouse.
By @segmondy - 5 months
I've had companies ask to read it DB, answer is no and over my dead body. They get API access or data dump. I've been asked to read external companies DB, I always refuse, give me API or data dump. I don't even agree for internal teams to share DBs. It's a bad idea, but I can see the draw for startups and small companies without the skills. BTW, can we chat about your former charge acts oriented startup? My info is in my profile.
By @airpodsman - 5 months
I was thinking the same thing. I have an idea that I've been thinking about for some time but requires data to the Customer's Data. They'd have to expose us to ingest that either through : 1) an API that the customer exposes to you 2) Connected to the data source directly in a secure manner.

Option 1 seems like a lot of work for the customer ... probably not viable.

Option 2: you can securely connect to their data-source using oAuth2.0. In fact I know MongoDB offers this (https://www.mongodb.com/docs/atlas/security-oidc/) for this exact use-case I believe. I recall using Vercel to integrate with my MongoDB.

MongoDB: https://www.mongodb.com/docs/atlas/security-oidc/ Supabase: https://supabase.com/docs/guides/auth

Though I haven't done it directly, I think integrating with the DB through a secure protocol like oAuth is the way to go. DB services offer this and must be for this exact use case.

By @hluska - 5 months
First off, this is creative and you did a great job of writing it up and explaining the benefits to your approach. And I think it is a good idea in extremely early stages of a software product because as you mention, it’s a quick way to get something up and running.

Unfortunately, the longer into the project you are, the more brittle that connection becomes. You’re tightly coupling a process to the state of the database at time of build. When you do that, you create a situation where your customers can kill your code.

So fundamentally, when you integrate directly, you create a situation where your customers’ engineers control how reliable your software can be. They likely won’t fess up when they break your tool so the problems will come back to you.

I know that sounds a lot like typing ‘npm install’ or ‘building an API’ but in this case, you’re handing the keys over to your customers. At best, it will cause reliability problems. At worst, it will add friction to a sales process. As companies scale and sales gains power, that can become a career issue for you. As a rule, it’s better to have difficult technical decisions when you’re employed than difficult financial decisions when you’re not.

So good writing and excellent analysis. But if you choose to go down that path, at some point in the future, I feel like that integration will become someone’s headache.

If you have to do it, it would be worth documenting that the customer in this case won’t provide proper access through an API. That’s a good reason to do something like this because that’s quite unreasonable of the customer. If they’re unwilling to provide API access, I feel like they’re just as unlikely to provide a read replica so aside from the integration’s brittleness, your queries have to be rock solid or you could slow down prod.

It’s all risky and you have a big decision to make. Good luck and have fun.

By @superice - 5 months
We’ve done this for an SME customer of ours. We were essentially building a new module on top of their existing in-house systems. We laid out our query patterns in advance so they could index properly, and only query specific views. Write-out is only in tables dedicated for our product.

It works surprisingly well, and is pretty resilient since it essentially acts as a message queue too. Then again, this is all low traffic stuff in a SME / B2B setting, with zero multitenancy involved.

I’ll still take a proper API or message queue any day though.

Edit: I suppose our biggest benefit is that our customer can actually change the interface with us fairly quickly. Then have database experts in-house, but devs who could do APIs. So the collaboration has been mostly smooth an account of that, and that is a huge advantage compared to them having to outsource any API work. Technically speaking I’m not a fan, but the non-technical results have been useful.

By @drewcoo - 5 months
Even if you're only doing reads, you'll be relying on interfaces that can change at any time with no notice. That's a kind of tech debt time bomb waiting to blow up your service and make a big pager party!

If you're doing writes, this is potentially harmful to the customer, too, not just data exfiltration but also potentially software breaking and that could be lawsuit territory.

Even doing this with your own services' data stores is bad practice. Direct reads/writes to a service's data store without going through its defined interfaces means unexpected, often unmonitored changes happening. I strongly advise against this pattern of "secret APIs," as opposed to overt ones.

Also, please note that this applies to any data store and not just a DB.

By @tw04 - 5 months
If they don’t have staff to hit an api, tying to their database sounds like a one-way trip to a lawsuit. The second something goes wrong they’re going to blame you and won’t have the technical expertise to understand your explanation of why it’s not your fault.
By @RajT88 - 5 months
Not me, but have seen this with a number of vendors.

- Customer buys a service from a vendor

- Customer creates a Vnet peering in Azure between their subscriptions

- Customer queries the vendor's DB's directly

I will not name names for the vendors or customers, but you have heard of all of them.

By @wackget - 5 months
It's really funny seeing some of the outraged, self-righteous responses here from developers who probably take themselves too seriously.

The OP already said that their customer didn't have the resource to even connect to an API, so what on earth makes you think they're gonna be able to create some highly-abstrated API layer or some other "bEsT PrAcTiCe" way of exposing the data?

And even if they did create a view or API for the database instead of giving direct access, what makes you think they'd keep that view or API up-to-date? It's just as likely to break as a changing database schema is.

By @rco8786 - 5 months
> the fintech not being able to dedicate engineering resources to send data to an API endpoint of ours

There's an extremely real chance that this fintech giving you direct DB access was breaking the law by doing so.

By @rukuu001 - 5 months
Yeah that's a blackops under the radar get it done quick solution that should be flagged like crazy for future spend on getting an API in place.

Of course, no one will dedicate time or money to that API until something breaks. Then you pull out all the emails, risk/assumption statements etc to highlight your due-diligence on informing the client. Then fix the problem, pushing the whole issue down the road again until the next time.

But seriously, expect to see this anywhere there are budget and time constraints. As soon as the fintech is big enough to get more worried about risk than growth, they should start taking this kind of thing seriously.

By @breadwinner - 5 months
Power BI does this using on-premise data gateway: https://learn.microsoft.com/en-us/power-bi/connect-data/serv...
By @486sx33 - 5 months
Trigger my database and you’re fired, software scab!

But in reality, any big corp assumes if you have the access, you’ve used it. So it should be pretty common

By @gfody - 5 months
in telecom resporgs query somos by directly connecting to their oracle database. it's pretty nice compared to say some rest api especially if you're integrating it into your own db as a linked server or remote table.
By @mandeepj - 5 months
how about recording all those attributes (name of seller, email of seller, etc) when a sale is made? Either by calling an API on your side, FTP file upload, or email. I prefer an Api call though
By @thehappypm - 5 months
Data visualization products do this regularly
By @budleigh_s - 5 months
I'm thinking that much of the architecture around building APIs in front of databases stems from the religious approach to SOA over HTTP/REST. I don't see much wrong with this as to me the database is still a sort of an API with a different protocol.

At Scramjet we've built an Platform+Agent engine and put it to use and in many cases we do integrate into the DB's directly without APIs through long-lived integration functions.

Let me address some of the comments below:

- hard coupling of DB is bad - I really don't see how adding an additional middleware results in lessening this, you're hard coupling into something else. While it might help if a database structure is changed on a whim but, from my experience, most of the schema changes reflect changed requirements.

- hard couping is bad in general - is it though? As a platform engineer I went into a pitfall of making everything reusable only to learn that 90% of the stack has never changed and the XML-based protocol projects developed in 2004 are still being developed without change to the protocol at all. A reactive approach with a separated client/data model is much better in my opinion and quite frankly it's mostly impossible to write a system these days without one.

- schema changes are unavoidable - but hey, as someone pointed out, there are views and materialised views. The RDBs do offer a good framework for keeping legacy schemas during the development cycle and this framework will save you the most money as it affects the least amount of moving parts in your system. It's there since the 90s at least, so why not use it?

- the security mumbo jumbo - this seems to be the most uninformed opinion - all leading RDBMS have an extremely robust and granual way to grant access to the data. They are battle tested for nearly half of a century. They are well known and easily tested. And, unlike the REST frameworks, HTTP servers, relays, caches and what not, those are contained in a single piece of software, limitting the attack vectors.

If you can deploy the client to the DB remotely, near the database then I'd say - you're doing this right and not only that, you're limitting the amount of code you need to deploy, maintain and depend on. If you separate the data, the access code and the logic from each other, you have achieved decoupling. But that does not mean that REST API's are the "one correct way".

Such an approach was the underlying idea of our Scramjet Platform and we've taken this approach not only in integrating into DBs, but also file storage, direct protocols and even GPIOs in some cases. And though we have a trick up our sleve of being able to split the code across distributed environments, even in general terms I'd say: it works, it's maintainable and way less expensive.

By @white_dragon88 - 5 months
> integrating with internal DBs could allow a startup to onboard and go live with customers much more quickly

The technical debt you'd accrue from this would be MASSIVE, and screw your entire architecture down to its roots. Hard pass from me.