Making Database Systems Usable
Database usability is essential for user satisfaction, often more important than technical performance. Despite advancements, challenges persist, including reliance on experts and complexities in querying, necessitating better interfaces and designs.
Read original articleThe discussion on database usability highlights its critical importance for users, emphasizing that usability often outweighs technical performance metrics. C. J. Date's 1983 keynote at Sigmod pointed out that users prioritize ease of operation over complex algorithms. A 2007 paper revisits this issue, illustrating that despite advancements, many usability challenges remain unresolved. The paper discusses the indirect costs associated with database management, where organizations often rely on technical experts to facilitate user interactions, leading to inefficiencies. It uses the example of flight booking systems to illustrate how web interfaces improved user experience by providing transparency and control. However, operationalizing databases remains complex, with users facing difficulties in querying and updating data due to the relational model's inherent challenges, such as the need for joins. The paper identifies various usability pain points, including overwhelming options, unexpected results, and the complexity of database creation. It suggests that while there is no simple solution, improvements could be made through better presentation models, intuitive interactions, and flexible schema designs. The discussion concludes by reflecting on the current state of database usability, noting that while some progress has been made, significant challenges persist, particularly with the integration of distributed systems and emerging technologies like LLMs.
- Database usability is crucial for user satisfaction and efficiency.
- Many organizations rely on technical experts, increasing indirect costs.
- The relational model complicates user interactions with databases.
- Usability challenges include overwhelming options and complex query formulation.
- Future improvements may come from better user interfaces and flexible database designs.
- Many users struggle with complex SQL queries, particularly JOIN operations, and express a desire for more intuitive solutions.
- There is a call for databases to better accommodate human thinking, suggesting that current relational models can lead to unnecessary complexity.
- Some commenters advocate for the use of nested queries over JOINs, citing ease of understanding.
- Users highlight the importance of user-friendly interfaces and tools that simplify database interactions.
- There is a recognition that while SQL is a standard, many users prefer alternatives that are perceived as easier to use, such as document databases.
Hugely important, and I would add "backup-and-restore" to that list. At risk of sounding conceited, ease of use is a primary goal of rqlite[1] -- because in the real world databases must be operated[2]. I never add a feature if it's going to measurably decrease how easy it is to operate the database.
[2] https://docs.google.com/presentation/d/1Q8lQgCaODlecHa2hS-Oe...
Disclaimer: I'm the creator of rqlite.
People endlessly want hacks, workarounds, and un-normalized data structures, for a single reason - they don't want to have to think about JOIN. It's not actually for performance reasons, it's not actually for any reason other than it's easier to imagine a big table with lots of columns.
I'm actually sympathetic to that reticence but what I am not sympathetic about is this: why, in 2024, can't the computer figure out multi-table joins for me?
Unless your schema is really fucked up, there should only be one or two actually sensible ways to join across multiple tables. Like say I have apples in boxes in houses. I have an apple table, a box table, and a house table. apples have a box_id, boxes have a house_id. Now I want to find all the apples in a house. Do two joins, cool. But literally everyone seems to write this by hand, and it means that crud apps end up with thousands of nearly identical queries which are mostly just boringly spelling out the chain of joins that needs to be applied.
When I started using SQLAlchemy, I naively assumed that a trivial functionality of such a sophisticated ORM would be to implement `apple.join(house)`, automatically figuring out that `box` is the necessary intermediate step. SQLAlchemy even has all the additional relationship information to figure out the details of how those joins need to work. But after weeks of reading the documentation I realized that this is not a supported feature.
In the end I wrote a join tool myself, which can automatically find paths between distant tables in the schema, but it seems ludicrous to have to homebrew something like that.
I'm not a trained software engineer but it seems like this must be a very generic problem -- is there a name for the problem? Are there accepted solutions or no-go-theorems on what is possible? I have searched the internet a lot and mostly just find people saying "oh we just type out all combinatorially-many possible queries"... apologies in advance if I am very ignorant here
1. A database that's meant to be understood by programmers, make queries by the application efficient in space and time, and provide strong referential integrity.
2. A database that's meant to be played with by humans, where duplication is fine, referential integrity is a nice-to-have, every column is okay to be nullable, tables should contain complete objects as understood by the user that map 1-1ish to the real world, that eliminate as many opaque ids as feasible, and foreign keys might exist but aren't enforced in any direction.
The latter database is far more ergonomic and you won't run up against a user frustratingly bashing their keyboard because the database just refuses to do what they want. The stakes in #2 style databases are extremely low-- let the user destroy their copy of the database it's fine, we can reload from the last save.
The nice thing is that it seems very possible to go from #1 -> #2 mechanically, and hand that off to the users who want to play with it.
This problem is heightened by the fact that a SQL table typically can't store composite types. Like, you can't make a column of lists-of-strings. Every modelling problem is solved by using more tables. And any nontrivial application ends up with a table explosion. You have to study your database very closely, then write complex, slow, join-heavy queries, just to get the information about simple stuff - like an order.
Solving every problem with tables might seem clever from a technical perspective, but its just not how humans think. There's a reason document databases like mongodb are popular. Even if they are worse technically, they're much easier to use and reason about.
There's no reason SQL databases couldn't support nested data. Postgres sort of does already via JSON fields. But it feels like you have to fight the database to do it.
I note that Julius.ai is making data analytics easier - just upload some data and ask for some charts and it does an ok job.
The problem of going from natural language to SQL is still a far from solved problem. The main benchmark for this is Bird-Bench and accuracy is only in the high 60s - https://bird-bench.github.io/
I have been working on making tables queryable using function calling, where you a fixed number of canned queries based on the primary key and an event-time column. PyData talk on this - https://www.youtube.com/watch?v=dRcjTe5qgwM
It is not super hard to find UI designed for end users, these days.
If you know the basics such as what a relational database is, then here’s is a good UI: https://visualdb.com
Actually this can be done by nested query, without joining. I prefer nested query than joins. Because join is "global" and nested query is "local" (global means interfering with the whole SQL statement). The local one is easier to understand.
This is a SELECT with 5 joins:
Foo.objects.filter(a__b__c__d__e=3)
Types are respected, valid relations exist and invalid relations do not. Clean.SQL is not going to go away. It's relatively easy to learn.
I'm going to disagree. I could call my travel agent and say "I am going to Chicago on <date> I need to be there by <time> call me back with a couple of nonstop options. And I'll need a room at the Hilton and Towers for two nights."
vs. today I can spend my own time navigating travel sites, avoiding their dark patterns, wondering if I'm really getting the best prices, making sure I understand the terms (is this refundable? are any checked bags included?) etc and then do the same for the hotel booking.
If you work in a business with a travel department or assistants who can manage your travel you have an idea, compared to the hassle of doing it all yourself.