September 19th, 2024

Netflix's Key-Value Data Abstraction Layer

Netflix has launched a Key-Value Data Abstraction Layer to enhance backend infrastructure, improving data access, reliability, and performance across distributed databases while supporting various data models and optimizing operations.

Read original articleLink Icon
Netflix's Key-Value Data Abstraction Layer

Netflix has introduced a Key-Value (KV) Data Abstraction Layer to enhance its backend infrastructure, which is crucial for delivering high-quality streaming experiences. The KV abstraction addresses challenges related to data access patterns across multiple distributed databases, such as Apache Cassandra. Developers faced issues with consistency, performance, and the need to frequently adapt to evolving database APIs. The KV abstraction simplifies data access, improves reliability, and supports a wide range of use cases with minimal developer effort. It employs a two-level map architecture, allowing for both simple and complex data models, and is designed to be database-agnostic, providing a consistent interface regardless of the underlying storage system. Key features include CRUD APIs for data manipulation, idempotency tokens to ensure data integrity, and efficient handling of large data through chunking. The abstraction also incorporates client-side compression to optimize performance and smarter pagination to maintain predictable operation latencies. Overall, the KV Data Abstraction Layer aims to streamline data management and enhance the performance of Netflix's global operations.

- Netflix's KV Data Abstraction Layer improves data access and reliability across its distributed databases.

- The architecture supports both simple and complex data models, enhancing flexibility for developers.

- Key features include idempotency tokens, efficient large data handling, and client-side compression.

- The abstraction is designed to be database-agnostic, providing a consistent interface for various storage systems.

- Smarter pagination strategies help maintain predictable latencies during data retrieval.

Link Icon 4 comments
By @ericmcer - 7 months
Can anyone explain why Netflix is considered to have such high tier engineering? Just from a super high level view they store and serve ~5000 videos saved at a few different qualities (4?) so lets say a total of 20,000 videos. Those files only change when specific privileged users update them.

Compare that with Youtube where ~5,000 videos are uploaded, processed into different formats/qualities every minute, and can be added by anyone with an email. It seems like Netflix has a fairly trivial problem when compared with video sharing or content sharing sites.

By @snicker7 - 7 months
This API is very similar to DynamoDB, which is basically a hash table of B-trees.

My experience is that this architecture can lead to very chatty applications if you have a rich data model (eg a graph).

By @jerf - 7 months
For anyone looking for a TL;DR, I'd suggest starting at https://netflixtechblog.com/introducing-netflixs-key-value-d... , which HN is truncating so you can't see it but I've directly linked to a later section in the post with a #. Up to that point it's basically "a networked HashMap<String, SortedMap<Bytes, Bytes>>". But the ability to return partial results based on a timeout with a pagination token is somewhat unusual and the next section called "Signaling" is at least worth a look.
By @throwaway984393 - 7 months
Back in the 2000s it was common to have libraries and services which would expose high level database functions to applications rather than give them direct database access. It solved so many problems.