September 3rd, 2024

Apache Zeppelin

Apache Zeppelin is an open-source web-based notebook for interactive data analytics, supporting multiple programming languages. The latest version 0.11.1 features Java 11, JDBC connections, and collaborative tools.

Read original articleLink Icon
DisappointmentNostalgiaCuriosity
Apache Zeppelin

Apache Zeppelin is a web-based notebook designed for interactive data analytics and collaborative documentation, supporting multiple programming languages including SQL, Scala, Python, and R. The latest version, 0.11.1, is built with Java 11 and supports the latest features of Apache Spark and Apache Flink. Zeppelin allows seamless connections to various JDBC data sources such as PostgreSQL, MySQL, and Apache Hive. It features built-in visualizations, dynamic forms, and multi-user support with LDAP for collaborative work. Users can easily create charts and share notebooks in real-time, similar to Google Docs. The platform is open-source and encourages community contributions, with a focus on data ingestion, discovery, analytics, and visualization.

- Apache Zeppelin supports multiple programming languages and interpreters.

- The latest version is built with Java 11 and supports Apache Spark and Flink.

- It allows seamless connections to various JDBC data sources.

- Users can create dynamic forms and visualizations easily.

- Zeppelin is open-source and promotes community involvement.

AI: What people are saying
The comments on Apache Zeppelin highlight various perspectives on its functionality and community support.
  • Users appreciate Zeppelin's interactive features compared to Jupyter, but note it lacks traction and community support.
  • Some suggest alternative notebooks like Almond and Polynote for Scala and Spark support.
  • There are concerns about Zeppelin's declining usage, with many preferring Jupyter or Databricks for their convenience and popularity.
  • Users reminisce about their past experiences with Zeppelin and its integration with Spark.
  • Overall, while Zeppelin has unique features, its adoption and development have stalled compared to other tools.
Link Icon 11 comments
By @iconara - 8 months
The big difference between Zeppelin and Jupyter is how you can easily build interactive notebooks with input fields, checkboxes, selects, etc. This is much closer to what I thought notebooks were going to evolve into back when I saw them the first time; Hypercard for the data engineer. Observable has kind of delivered that, but on the frontend. Jupyter seems to me to have gone down the path of code editor with cells, and Zeppelin unfortunately never got any traction.
By @DiskoHexyl - 8 months
Tried deploying this in k8s for data analysts and data engineers to use (mostly with pySpark in mind) as a way to provide non-developer crowd with a ready-made environment with batteries included, e.g. all of the database and local s3 connections ready, popular libraries installed, secrets vault inregrated etc.

Didn't work out all that well for a number of reasons.

The most important thing is, users are used to Jupyter. Zeppelin's ui is very different, and most people are not willing to jump on yet another learning adventure just for the sake of it.

Then, it's not as widely adopted and supported as JupyterHub- with JupyterHub you can easily integrate whatever you want to. Want several simultaneous jupyters for each user? Sure. Want separate quotas, different k8s namespaces for user groups? Easy. A shitton of plugins? Here you go. A selection of different images for each user, depending on the tooling required? Welcome.

Third thing is really unfortunate, but Zeppelin proved to have a less than stellar stability and performance, at least in my experience. People are wary of something that's often unreliable.

So I've finally decided to just go with JupyterHub, and users can't be happier. Everything's fully customized, things are smooth and familiar to a non-dev crowd.

Another, and in some ways, better solution would be to go with vscode, but I doubt a typical analyst/ds would prefer vscode, at least for now.

All in all, I don't see a place for Zeppelin- it can't compete with what's already on the market and yet doesn't bring anything new and worthwhile.

By @hocuspocus - 8 months
If you're looking for more modern notebooks supporting Scala (and Spark):

- https://almond.sh

- https://polynote.org

Toree is mostly dead but might also get a Scala 2.13 release now that Spark 4.0 is approaching.

By @alexott - 8 months
Unfortunately, it didn’t get enough community around, and development has stalled. For some time it was sponsored by Alibaba, but at some point of time, the main maintainer left it. Similar story with other people

P.S. I was committer there until changed job.

By @KronisLV - 8 months
It's cool to stumble upon Apache projects every now and then.

Not all of them get that much love, but often they have pretty nice functionality.

I still remember that setting up Apache Skywalking was one of the easier ways of getting some APM and tracing in place, compared to the other options out there.

And, of course, the likes of Apache2 and Apache Tomcat are also quite useful in some circumstances.

By @rad_gruchalski - 8 months
Good old Apache Zeppelin. It’s almost a decade since I last worked with Zeppelin and Spark Notebook at Technicolor Virdata. Shout out to Eric from Datalayer.
By @benzible - 8 months
Obligatory mention of Livebook: https://livebook.dev/
By @forgetfulness - 8 months
Mean of me to say, but you're just better off using Jupyter as a local notebook sandbox, for one, the relevant development Docker image does bundle Spark[1], making it more convenient to fire up, and more importantly, it's used way more than Zeppelin, as orgs not using Jupyter are probably using Databricks notebooks instead, and it's split between those two.

Zeppelin does make it easier to run Scala Spark, I find, but Scala Spark usage has declined rapidly.

1. https://hub.docker.com/r/jupyter/pyspark-notebook

By @Woshiwuja - 8 months
What is its use case? Looks like a jupiter-ish thing