July 22nd, 2024

Maestro: Netflix's Workflow Orchestrator

Netflix introduces Maestro, a versatile workflow orchestrator on GitHub. Maestro handles large-scale workflows efficiently, supporting various use cases with scalability, reusable patterns, and configurable features. It simplifies workflow management and offers flexibility.

Read original articleLink Icon
WorkflowComparisonSkepticism
Maestro: Netflix's Workflow Orchestrator

Netflix has introduced Maestro, a workflow orchestrator now available to the public on GitHub. Maestro manages large-scale workflows like data pipelines and machine learning model training pipelines, supporting both acyclic and cyclic workflows with reusable patterns. The platform has successfully handled an 87.5% increase in executed jobs, launching thousands of workflow instances daily. Maestro offers scalability and versatility, managing various workflow use cases at Netflix efficiently. The orchestrator simplifies workflow management by preserving key properties across versions and supporting configurable retry policies. Maestro's workflow definition includes properties, versioned workflow metadata, and steps, allowing users to define complex workflows with parameters and expression language support. The platform offers different run strategies like sequential, strict sequential, first-only, last-only, and parallel with concurrency limit to automate data pipelines effectively. Maestro also provides building blocks like foreach loops, conditional branches, and subworkflows to define dataflow patterns easily. The platform's parameterized workflows strike a balance between static and dynamic workflows, offering flexibility and ease of management for users.

AI: What people are saying
The comments on Netflix's introduction of Maestro, a workflow orchestrator, cover various perspectives and comparisons.
  • Comparison with other tools: Several comments compare Maestro to existing tools like Windmill, Temporal, Airflow, and ActiveBatch, highlighting differences in language, scalability, and features.
  • Concerns about maintenance and adoption: Some users express skepticism about adopting new tools from large corporations due to potential lack of long-term support and the liability of maintaining custom code.
  • Interest in open-source contributions: There is appreciation for Netflix's open-source efforts, though some question the practical impact and adoption of yet another orchestrator in a crowded space.
  • Technical details and confusion: Comments discuss technical aspects such as the use of CockroachDB, Java, and comparisons to Netflix's previous tools like Conductor, with some confusion about the project's dependencies and versions.
  • Desire for simplicity and usability: Users express a need for simpler, more user-friendly workflow orchestrators, especially for non-enterprise or single-server deployments.
Link Icon 31 comments
By @slt2021 - 7 months
I used to be impressed with these corporate techblogs and their internal proprietary systems, but not so much anymore. Because code is a liability.

I would rather use off-the-shelf open source stuff with long history of maintenance and improvement, rather than reinvent the cron/celery/airflow/whatever, because code is a liability. Somebody needs to maintain it, fix bugs, add new features. Unless I get +1 grade promotion and salary/rsu bump, ofc.

People need to realize that code is a liability, anything that is not the business critical stuff that earns/makes $$$ for the company is a distraction and resource sink.

By @hintymad - 7 months
I wonder how many iterations we will need before engineers are happy with a workflow solution. Netflix had multiple solutions before Maestro, such as metaflow. Uber built multiple solutions too. Amazon had at least a dozen internal workflow engines. It's quite curious why engineers are so keen on building their own workflow engines.

Update: I just find it really interesting that many individuals in many companies like to build workflow engines. This is a not deriding comment towards anyone or Netflix in particular. To me, such observation is worth some friendly chitchat.

By @rubenfiszel - 7 months
Founder of https://windmill.dev here which share many similarities with Maestro.

> Maestro is a general-purpose, horizontally scalable workflow orchestrator designed to manage large-scale workflows such as data pipelines and machine learning model training pipelines. It oversees the entire lifecycle of a workflow, from start to finish, including retries, queuing, task distribution to compute engines, etc.. Users can package their business logic in various formats such as Docker images, notebooks, bash script, SQL, Python, and more. Unlike traditional workflow orchestrators that only support Directed Acyclic Graphs (DAGs), Maestro supports both acyclic and cyclic workflows and also includes multiple reusable patterns, including foreach loops, subworkflow, and conditional branch, etc.

You could replace Maestro with Windmill here and it would be precisely correct. Their rollup is what we call the openflow state.

Main differences I see:

- Windmill is written in Rust instead of Java.

- Maestro relies on CockroachDB for state and us Postgresql for everything (state but also queue). I can see why they would use CockroachDB, we had to rollout our own sharding algorithms to make Windmill horizontally scale on our very large scale customer instances

- Maestro is Apache 2.0 vs Windmill AGPL which is less friendly

- It's backed by Netflix so infinite money but although we are profitable, we are a much smaller company

- Maestro doesn't have extensive docs about self-hosting on k8s or docker-compose and either there is no UI to build stuff, or the UI is not yet well surfaced in their documentation

But overall, pretty cool stuff to open-source, will keep an eye on it and benchmark it asap

By @skissane - 7 months
I'm a bit confused about what is going on here: This project appears to use Netflix/conductor [0]. But you go to that repo, you see it has been archived, with a message saying it is replaced by Netflix's internal non-OSS version, and by unmentioned community forks – by which I assume they mean Orkes Conductor [1]. But this isn't using Orkes Conductor, it looks like it is using the discontinued Netflix version `com.netflix.conductor:conductor-core:2.31.5` [2] – and an outdated version of it too.

[0] https://github.com/Netflix/conductor

[1] https://github.com/conductor-oss/conductor

[2] https://github.com/Netflix/maestro/blob/e8bee3f1625d3f31d84d...

By @saturn8601 - 7 months
Anyone here use Activebatch? To me it is the best software I wish had an equivalent for non enterprise users. I have tried and tried to use other "competitors" but Activebatch's simplicity of just attaching a simple MS SQL DB, installing the Windows GUI and execution agent is just click, click, click and now you have a robust GUI based automation environment where you don't have to use code...or if you want, go ahead and use code in any language if you want...but you don't have to.

Airflow may be robust but it is hidden behind a complexity fence that prevents most from seeing whatever its true capability may be. The same goes for other "open source" competitors.

Why can't someone just develop a robust DB backed GUI first system?

I have tried online services as well, they pale in comparison. I guess the cost of maintaining extensions is what kills simpler paid offerings?

Its a complete shame that ActiveBatch is walled off behind a stupid enterprise sales model. This has prevented this wonderful piece of software from being picked up by the wider community. Its like a hidden secret. :/

By @skywhopper - 7 months
Advice: don’t rely on any tool open-sourced by Netflix. They have a long history of dropping support for things after they’ve announced them. Someone got a checkmark on their promotion packet by getting this blog post and code sharing out the door, but don’t build your business on a solution like this.
By @meliora245 - 7 months
why would one consider this over something more established such as Temporal, also I see Maestro is written in Java vs Temporal's Go
By @pantsforbirds - 7 months
This is a really great-looking project. I know I've considered building (a probably worse) version of exactly this on almost every mixed ML + Data Engineering project I've ever worked on.

I'm looking forward to testing it out.

By @HugoLu88 - 7 months
I'm building something in the space (orchestra) so here's my take:

Folks making stuff open source and building in the open is obviously brilliant, but when it comes to "orchestrators" (as this is, and identifies) there is already so much that has been before (Airflow and so on) it's quite hard to see how this actually adds anything to the space other than another option nobody is ever going to use in a commercial setting.

Shameless plug: https://getorchestra.io

By @indiv0 - 7 months
Is this meaningfully different from Conductor (which they archived a while back)? Browsing through the code I see quite a few similarities. Plus the use of JSON as the workflow definition language.
By @iamsanteri - 7 months
So will this serve as a stand-in replacement for something like Airflow?
By @dboreham - 7 months
Interesting. My team recently built a thing for managing long running, multi-machine, restartable, cascading batch jobs in an unrelated vehicle. Had no idea it was a category.
By @gtrubetskoy - 7 months
The name Maestro has already been used for a workflow orchestrator which I worked on back in 2016. That maestro is SQL-centric and infers dependencies automatically by simply examining the SQL. It's written in Go and is BigQuery-specific (but could be easily adjusted to use any SQL-based system).

https://github.com/voxmedia/maestro/

By @jekude - 7 months
Seems like they re-engineered Temporal: https://temporal.io/
By @willbeddow - 7 months
I'm sure this is very nice, but the article reads as if written by AI. The first thing I'd want to see is an example workflow (both code and configuration) in a realistic use case. Instead, there's a lot of "powerful and flexible" language, but the example workflow doesn't come until halfway down, and then it's just foobar
By @halamadrid - 7 months
Very nice, Netflix has a reputation of making great OSS products. I wonder where does this stand with Conductor.
By @tiffanyh - 7 months
Don't see many Java projects being posted on HN.
By @andbberger - 7 months
slightly off topic, but there is dire need for a scientific "workflow manager" built to FAANG engineering standards attuned for the needs of academia (ie primarily designed to facilitate execution of DAGs on clusters). The airflows of the world have complex unnecessary features and require extensive kitbashing to plug into slurm and the academic side of things is a huge mess. Snakemake comes the closest but suffers from massive feature creep, a bizarre specification DSL (superset of python) and blurred resource requirement abstraction boundaries.
By @oneplane - 7 months
Looks a bit like Argo Workflows combined with Argo Events. Makes sense to have so many projects and products converge around the same endstate.
By @antishatter - 7 months
Anyone have a recommendation for a workflow orchestrator for single server deployments? Looking at running a project at home and for certain pieces think it would be easiest to orchestrate with a tool like Maestro or Airflow but they’re basically set up to run in clusters with admins to manage them.
By @mianos - 7 months
Interesting how complete this is. It's almost as comprehensive as prefect.io

This is a critical software infrastructure I have been promoting for years yet almost everyone thinks they don't need it.

By @kabes - 7 months
It says one of the big differentiators with 'traditional workflow orchestrators' is that is supports cyclic graphs. But BPMN (and the orchestrators using it) also supports loops.
By @petromir - 6 months
So they abandoned https://github.com/Netflix/conductor to create Maestro
By @febed - 7 months
Dagster is a better alternative, because of its asset first philosophy. Task based workflows are still available if you really need it.
By @Sparkyte - 7 months
Whats the difference of this and enqueue work into a queue then waiting for a job to pick it up at a scheduled time? Not saying build a Kafka cluster to serve this but most cloud providers have queuing tools.
By @bjourne - 7 months
What is a workflow in this context?
By @monkychop - 7 months
Hooolaa
By @monkychop - 7 months
Eduardo
By @nikhilsimha - 7 months
great job on open sourcing!