The Future of Kdb+
The article examines kdb+'s future in financial services, noting competition from newer technologies and suggesting KX should enhance its product and consider strategic changes to maintain relevance.
Read original articleThe article discusses the future of kdb+, a database technology primarily used in financial services. The author reflects on their experiences and outlines various use cases for kdb+, including historical market data storage, local quantitative analysis, real-time streaming calculations, and distributed computing. However, they note that kdb+ faces significant competition from newer technologies such as Clickhouse, QuestDB, and cloud services like BigQuery and Redshift. The author highlights that many users do not require the speed of kdb+ and that internal bank platforms often do not utilize its full potential.
For local quant analysis, Python-based tools like DuckDB and Polars are gaining traction, making it difficult for kdb+ to compete, especially given its high entry costs. The author expresses uncertainty about kdb+'s prospects in real-time streaming and distributed computing, suggesting that competitors like Kafka are more widely adopted.
Despite acknowledging kdb+ as a powerful technology, the author argues that it has not evolved significantly over the past 15 years, allowing competitors to adopt and improve upon its features. They recommend that KX, the company behind kdb+, should release a free version, enhance the core product, simplify the learning curve, and increase its popularity to avoid decline. The article concludes with a call for KX to focus on its core technology while considering broader financial and strategic changes.
Related
At 50 Years Old, Is SQL Becoming a Niche Skill?
SQL, a foundational technology, faces scrutiny in today's IT world. Evolving roles like data scientists challenge its centrality. Debates persist on SQL's relevance against newer technologies like JSON queries, impacting its future role.
Self-Hosting Kusto
The article delves into self-hosting Kusto for timeseries analytics, detailing setup with the Kusto Emulator in Docker. It covers connecting, database management, data import, querying with KQL, and anomaly detection. Speed and efficiency are highlighted.
Db2 is a story worth telling, even if IBM won't
Db2, IBM's renowned database, with a history since the 1980s, faces uncertainty as IBM remains quiet about its future. Recent updates include AI features and cloud integration, but lack of communication raises concerns about its competitiveness against growing alternatives like PostgreSQL.
Db2 is a story worth telling, even if IBM won't
Db2, IBM's longstanding relational database, faces uncertainty as IBM remains tight-lipped about its future. Recent AI enhancements and a move towards a cloud-first approach contrast with IBM's vague roadmap, sparking speculation.
- Many users express frustration with kdb+'s complexity, high costs, and language design, suggesting it is not user-friendly for broader applications.
- There is a strong sentiment towards open-source alternatives like Python, ClickHouse, and DuckDB, which are seen as more accessible and flexible.
- Some commenters highlight the benefits of kdb+'s vertical integration but acknowledge that its proprietary nature and technical debt hinder its adoption.
- Several users advocate for KX to consider open-sourcing kdb+ to attract a wider developer community and foster innovation.
- Overall, there is a consensus that while kdb+ has its strengths, it faces significant competition and challenges in maintaining relevance.
It's also a column store, with compression. Runs super fast, I've used it in a couple of financial applications. Huge amounts of tick data, all coming down to your application nearly as fast as the hardware will allow.
Good support, the guys on Slack are responsive. No, I don't have shares in it, I just like it.
Regarding kdb, I've used it, but there are significant drawbacks. Costs a bunch of money, that's a big one. And the language... I mean it's nice to nerd out sometimes with a bit of code golf, but at some point you are going to snap out of it and decide that single characters are not as expressive as they seem.
If your thing is ad-hoc quant analysis, then maybe you like kdb. You can sit there and type little strings into the REPL all day in order to find money. But a lot of things are more like cron jobs, you know you need this particular query run on a schedule, so just turn it into something legible that the next guy will understand and maintain.
People could complain about abysmal language design or debugging but what I found the most frustration in the coding conventions that they had (or had not), and I think the language and the community play a big role there. But also the company culture: I asked why the code was so poorly documented (no comments, single letter parameters, arcane function names). "We understand it after some time and this way other teams cannot use our ideas."
Overall, their whole stack was outdated and ofc they could not do very interesting things with a tool such as Q. For example, they plotted graphs by copying data from qStudio to Excel...
The only good thing was they did not buy the docker / k8s bs and were deploying directly on servers. It makes sense that quants should be able to fix things in production very quickly but I think it would also make sense for web app developers not to wait 10 minutes (and that's when you have good infra) to see a fix in production.
I have a theory on why quants actually like kdb: it's a good *weapon*. It serves some purpose but I would not call it a *tool* as building with it is tedious. People like that it just works out of the box. But although you can use a sword to drive nails, it is not its purpose.
Continuing on that theory, LISP (especially Racket) would be the best *tool* available as it is not the most powerful language out of the box but allows to build a lot of abstractions with features to modify the language itself. C++ and Python are just great programming languages as you can build good software on them, Python being also a fairly good weapon.
Q might give the illusion of being the best language to explore quant data, but that's just because quants do not invest enough time into building good software and using good tools. When you actually master a Python IDE, you are definitely more productive than any Q programmer.
And don't get me started on performance (the link covers it anyway even though the prose is bad).
If your organization has already committed to serving some of these roles with other pieces of software, protocols, or formats, the benefits of vertical integration- both in development workflow and overall performance- are diminished. When kdb+ itself is both proprietary and expensive it is understandably difficult to justify a total commitment to it for new projects. It's a real shame, because the tech itself is a jewel.
Get a free version out there that can be used for many things…
I think this has been the biggest impediment to kdb+ gaining recognition as a great technology/product and growing amongst the developer community.Having used kdb+ extensively in the finance world for years, I became a convert and a fan. There’s an elegance in its design and simplicity that seems very much rooted in the Unix philosophy. After I left finance, and no longer worked at a company that used kdb+, I often felt the urge to reach for kdb+ to use for little projects here and there. It was frustrating that I couldn’t use it anymore, or even just show colleagues this little known/niche tool and geek out a little on how simple and efficient it was for doing certain tasks/computations.
I built my own language for time-series analysis because of how much I hated q/kdb+, but Python has been the winner for a bunch of years now.
Agree with all the recommendations, except I think kx should open source the platform. This will attract the breed of developer that will want to contribute back to the ecosystem with improvements and tools.
1. ClickHouse is not a new technology — it has been open-source since 2016 and in development since 2009.
2. ClickHouse can do all three use cases: historical and real-time data, distributed and local processing (check clickhouse-local and chdb).
3. ClickHouse was the first SQL database with ASOF JOIN in the main product (in 2019) - after kdb+, which is not SQL.
Greenfield development though would use Python.
Here is a link on how you do queries: https://code.kx.com/q/basics/funsql/
TL;DR;
This is a select: q)t:([] c1:`a`b`a`c`a`b`c; c2:101+til 7; c3:1.11+til 7)
And this is another select: q)?[t; ((>;`c2;35);(in;`c1;enlist[`b`c])); 0b; ()]
Mind that these are the basic queries :)))))
The future of kdb+ is in the toilet.
We’ve maintained a financial exchange w/ margining for 8 years with it, and I guarantee you that everyone was more than relieved - customers and employees alike, once we were able to lift and shift the whole thing to Java.
The readability and scalability is abysmal as soon as you move on from a quant desk scenario (which everyone agrees, it is more than amazing at.. panda and dask frames all feel like kindergarten toys compared), the disaster recovery options are basically bound to having distributed storage - which are by the way “too slow” for any real KDB application given the whole KDB concept marries storage and compute in a single thread.. and use-cases of data historical data, such as mentioned in the article, become very quickly awful: one kdb process handles one request at once, so you end up having to deploy & maintain hundreds of RDB keeping the last hour in memory, HDBs with the actual historical data, pausing for hourly write downs of the data, mirroring trees replicating the data using IPC over TCP from the matching engine down to the RDBs/HDBs, recon jobs to verify that the data across all the hosts.. Not to mention that such a TCP-IPC distribution tree with single threaded applications means that any single replica stuck down the line (e.g. big query, or too slow to restart) will typically lead to a complete lockup - all the way to the matching engine - so then you need to start writing logic for circuit breakers to trip both the distribution & the querying (nothing out of the box). And then at some point you need to start implementing custom sharding mechanisms for both distribution & querying (nothing out of the box once again..!) across the hundreds of processes and dozens of servers (which has implications with the circuit breakers) because replicating the whole KDB dataset across dozens of servers (to scale the requests/sec you can factually serve in a reasonable timeframe) get absolutely batshit crazy expensive.
And this is the architecture as designed and recommended by the KX consultants that you end up having to hire to “scale” to service nothing but a few billions dollars in daily leveraged trades.
Everything we have is now in Java - all financial/mathematical logic ported over 1:1 with no changes in data schema (neither in house neither for customers), uses disruptors, convenient chronicle/aeron queues that we can replay anytime (recovery, certifying, troubleshooting, rollback, benchmarks, etc), and infinitely scalable and sharded s3/trino/scylladb for historical.. Performance is orders of magnitude up (despite the thousands of hours micro-optimizing the KDB stack + the millions in KX consultants - and without any Java optimizations really), incidents became essentially non-existent overnight, and the payroll + infra bills got also divided by a very meaningful factor :]
For benchmarks, I would check out STAC M3... kdb+ holds 17 world records there and that is something we’re proud of. The Clickbench benchmarks cited in the article, however, aren’t designed for time series databases and kdb+ isn’t included (probably for that reason). I don’t think it’s relevant here. We also think that speed – and performance in general – is still important to our customers, as they continue to affirm.
As far as accessibility is concerned, I’d like to address in multiple parts:
1) We are invested in creating cloud-native features that are more appealing for smaller firms
2) q is the best language out there (in our opinion) but we also offer a path for Python (including Polars) and SQL developers, which is essential to expanding the kdb+ userbase to the maximum extent. Our entire Fusion interfaces was built to enable more interoperability. We also don’t mandate language lock-in... there is nothing preventing other languages from being used with kdb+.
3) Pricing—this comes up a lot. We already offer a free edition of kdb+ for non-commercial use that is very popular. We recognize there’s more we can do in this area (an opinion expressed by KX leadership too) so new pricing models are actively being evaluated.
4) Our latest release of kdb+ 4.1 included a renewed focus on ease of installation and use, and a new documentation hub is being launched this year to further enhance the developer experience.
5) Our Community is growing rapidly – with now over 6000 members and 10 courses available in KX Academy. We have more and more developers networking to help others learn kdb+ every day with a month-over-month net new increase of members for the past 30 months. We’ve recently launched a Slack channel and developer advocacy program too.
There’s a lot of criticism about kdb+ (and KX) in this article, but a lot of the things devs love the most about kdb+ have been left out. This includes efficiency/compactness, expressiveness of q, vertical integration, and speedy development workflow. Sure, if you want to combine 3-5 tools to do what kdb+ does you can go that route, but we feel we offer a vastly superior experience with performance at scale. A quality that extends to ALL our products, including Delta & KDB.AI, since they are all built on kdb+.
Note: I reached out to the author to discuss, but he declined to talk to us. We posted a response on his blog too, but he never published the comment. It's been a pretty closed off situation for us, so leaving this here.
Alternatives (which are open source) to KDB+ are split into two categories:
New Database Technologies (tick data store & ASOF JOIN): Clickhouse & QuestDB
Local Quant Analysis: Python – with DuckDB & Polars
Some personal thoughts:
Q is very expressive, and impressive performance can be extracted from kdb+, but the drawbacks are proprietary formats, vendor lock-in, costs, proprietary language and reliance on external consultants to make the system run adequately, which can increase operational costs.
I'm personally excited to see the open-source alternative stack emerging. Open Source time-series databases and tools like duckdb/polars for data science are a good combination. Storing everything in open formats like Parquet and leveraging high-performance frameworks like Arrow is probably where things are heading.
Seeing some disruption in this industry specifically is interesting; I think it will be beneficial, particularly for developers.
NB: disclosing that I'm from questdb to put thoughts in perspective
Related
At 50 Years Old, Is SQL Becoming a Niche Skill?
SQL, a foundational technology, faces scrutiny in today's IT world. Evolving roles like data scientists challenge its centrality. Debates persist on SQL's relevance against newer technologies like JSON queries, impacting its future role.
Self-Hosting Kusto
The article delves into self-hosting Kusto for timeseries analytics, detailing setup with the Kusto Emulator in Docker. It covers connecting, database management, data import, querying with KQL, and anomaly detection. Speed and efficiency are highlighted.
Db2 is a story worth telling, even if IBM won't
Db2, IBM's renowned database, with a history since the 1980s, faces uncertainty as IBM remains quiet about its future. Recent updates include AI features and cloud integration, but lack of communication raises concerns about its competitiveness against growing alternatives like PostgreSQL.
Db2 is a story worth telling, even if IBM won't
Db2, IBM's longstanding relational database, faces uncertainty as IBM remains tight-lipped about its future. Recent AI enhancements and a move towards a cloud-first approach contrast with IBM's vague roadmap, sparking speculation.