November 19th, 2024

A Walk with LuaJIT

The article describes a zero-instrumentation profiler for LuaJIT using eBPF technology, addressing performance profiling challenges, including trace explosion and limitations of existing profiling tools with JIT frames.

Read original articleLink Icon
A Walk with LuaJIT

The article discusses the implementation of a zero-instrumentation profiler for LuaJIT, utilizing eBPF technology to scrape call stack information for performance profiling. The author highlights the transition from previous profiling methods to the OpenTelemetry eBPF profiler, which captures essential stack information and metadata. LuaJIT, a high-performance Just-In-Time (JIT) compiler for the Lua programming language, is noted for its efficiency, being significantly faster than standard Lua. The tracing JIT mechanism of LuaJIT is explained, emphasizing its ability to optimize frequently executed code paths while avoiding unnecessary overhead from less common paths. However, the article also addresses the challenge of "trace explosion," where numerous hot loops can lead to excessive memory consumption. The author outlines the process of profiling LuaJIT programs by walking the stack to identify transitions between native and Lua code, ultimately aiming to provide insights into performance optimization. The discussion includes references to existing profiling tools and the limitations they face with LuaJIT's architecture, particularly regarding unwinding JIT frames. The article serves as a technical exploration of profiling techniques tailored for LuaJIT, aiming to enhance performance analysis in applications using this scripting language.

- The article details the development of a zero-instrumentation profiler for LuaJIT using eBPF.

- LuaJIT is significantly faster than standard Lua, with a unique tracing JIT mechanism.

- Challenges such as "trace explosion" can complicate memory management in JIT compilation.

- The profiling process involves analyzing stack transitions between native and Lua code.

- Existing profiling tools face limitations in unwinding JIT frames due to LuaJIT's architecture.

Link Icon 6 comments
By @gnurizen - 6 months
I wrote the code and the blog, happy to answer any questions/comments. Very eager to have folks try it out and give feedback! Like is my meme game strong or very strong? J/K

There's some missing bits around FFI and callbacks (i.e. C calling function pointer that is a luajit generated stub back into the interpreter) and curious if anyone actually uses these things in OpenResty workloads. Deploy and enjoy!

By @brancz - 6 months
Thanks for submitting! We know HN has a sweet spot for LuaJIT, so we figured it would eventually end up here.

Quick summary: this post dives into the gory details of how we implemented an eBPF based profiler for LuaJIT.

Let us know if you have any questions on this, we’ll keep an eye out on comments!

By @alberth - 6 months
I’m tremendous excited about LuaJIT 3.0 development.

https://github.com/LuaJIT/LuaJIT/issues/1092

Q: does anyone know timeline on the release?

By @benwilber0 - 6 months
LuaJIT.org stopped publishing release tarballs [1] which caused leafo's GH actions builds [2] to suddenly stop working. The workaround was to start testing against OpenResty's distribution of LuaJIT [3] which is incompatible with LuaJIT.org's version.

There is no faster way to make a fork the de facto standard version than to break everyone's CI builds.

[1] https://luajit.org/download.html

[2] https://github.com/leafo/gh-actions-lua/issues/49

[3] https://github.com/openresty/luajit2

By @mraleph - 6 months
At some point in my life (when I briefly worked on LuaJIT for DeepMind) I have written a stack walker which can stitch together native and Lua frames: for each native stack frame it checks if that is actually an interpreter frame or a trace frame - if that's the case it finds corresponding `lua_State` and unwinds corresponding Lua stack, then continues with native stack again.

This way you get a stack trace which contains all Lua and native frames. You can use it when profiling and you can use it to print hybrid stack traces when your binary crashes.

I was considering open-sourcing it, but it requires a bunch of patches in LJ internals so I gave up on that idea.

(There is also some amount of over-engineering involved, e.g. to compute unwinding information for interpreter code I run an abstract interpretation on its implementation and annotate interpreter code range with information on whether it is safe or unsafe to try unwinding at a specific pc inside the interpreter. I could have just done this by hand - but did not want to maintain it between LJ versions)

By @dreampeppers99 - 6 months
Lua and Nginx are fantastic. Did you know it's possible to add behavior/code lua for openrest dynamically? https://github.com/leandromoreira/lua-resty-dynacode?tab=rea...