December 18th, 2024

Translating 10M lines of Java to Kotlin

Meta is translating its Android codebase from Java to Kotlin to boost developer productivity and ensure null safety, using the Kotlinator tool and collaborating with JetBrains to enhance conversion accuracy.

Read original articleLink Icon
Translating 10M lines of Java to Kotlin

Meta is undertaking a significant project to translate its extensive Android codebase from Java to Kotlin, aiming to enhance developer productivity and ensure null safety. The transition, which began several years ago, has surpassed the halfway mark, with plans to convert tens of millions of lines of code. While many companies opt to write new code in Kotlin without converting existing Java code, Meta has chosen a comprehensive approach to fully leverage Kotlin's advantages. This includes developing a custom tool called the Kotlinator, which automates the translation process in six phases, addressing challenges such as slow build speeds and the need for compatibility with existing Java code. The Kotlinator incorporates preprocessing and postprocessing steps to ensure that the translated code is functional and idiomatic. Additionally, Meta collaborates with JetBrains to improve the Java-to-Kotlin conversion tool, J2K, and to enhance the accuracy of symbol resolution. The project emphasizes the importance of null safety, as unaddressed nullability can lead to runtime errors. Despite the complexities involved, Meta's commitment to a complete translation reflects its goal of maximizing the benefits of Kotlin while minimizing the drawbacks of a mixed codebase.

- Meta is translating its entire Android codebase from Java to Kotlin.

- The Kotlinator tool automates the translation process in six phases.

- The project aims to enhance developer productivity and ensure null safety.

- Meta collaborates with JetBrains to improve the Java-to-Kotlin conversion tool, J2K.

- The initiative addresses challenges like slow build speeds and compatibility with existing Java code.

Link Icon 23 comments
By @aduffy - 5 months
I’m skeptical of the value in doing this. There are a mountain of tools like NullAway, ErrorProne, Immutables that make it so much easier to write safe code in Java. New developments in the language like first-class record types improve the DX as well.

I think Kotlin helped push Java in the right direction, but at this point it seems like a weaker choice. Spending years to migrate a massive Java code base like this feels like wasted time.

By @hitekker - 5 months
At my last job, the management greenlighted a full rewrite in Kotlin in order to attract/retain developers bored with Java and Python. The actual business project was boring since all the important design work was already finished by the architect. No language rewrite, no interested devs. So management made a quiet trade with ICs where everyone got what they wanted at the cost of future maintenance.

I learned that social whims (developer fun, preferences , dopamin) are weighted as much as technical rationales (performance, maintenance)

By @valenterry - 5 months
Crazy project.

Personally I find that it's an interesting indicator of the capability of the programming languages. Moving from language A to B can be extremely easy if B is as powerful or more powerful in terms of expressiveness than A. It can be an absolute horror if it is less powerful.

Being not null-safe in fact brings additional expressiveness. I guess most would argue that it's not a good type expressiveness. Nonetheless it is a type of expressiveness that can cause trouble during such a transition.

In general it feels like Java is making faster progress than Kotlin nowadays. I wonder where Kotlin would be if it weren't for Android. I hope those folks don't have to migrate back before they even finished.

By @keyle - 5 months
I have a genuine side question... Why does Meta have 10M lines of Java for their Android code base? What's in it?
By @freeqaz - 5 months
What are the benefits of Kotlin over Java? Something I wish they went into!
By @werdnapk - 5 months
Why does fb.com always erase your history so the back button no longer works once you click on the link to go there?
By @spullara - 5 months
This seems like a huge waste of time unless they expect Google to deprecate Java on Android - which isn't impossible.
By @yearolinuxdsktp - 5 months
I cry for the years of lost got blame history. It sure must suck to delve into an issue in a 10M-line-code base and have all your lines be annotated with useless language conversion commits.
By @whoisthemachine - 5 months
Interesting that they didn't use one of their AI models to assist them...
By @irunmyownemail - 5 months
I view this as great news. It will attract those would rather code in Kotlin for whatever reason, to FB, leaving more Java opportunities for the rest of us who like and prefer Java, even after 29 years with it.
By @neocron - 5 months
Ah, here we go again with HHVM and Hack ...

The only reason fb is able to do this, is the billions of $ behind it... For everyone else this is just pure idiocy

Sure if you like Kotlin, use it for new software, but rewriting milliona loc for some marginal gains... that how businesses fail more often than not

By @mightyham - 5 months
Kotlin's null safety is a huge win, but the language has it's own set of flaws such that I try to avoid using it.

The language's "declarative style lambda syntax" makes code overly-complex. Every library is basically it's own DSL and every lambda context has it's own set of specialized keywords. If I write `test { somthing(value) }`, depending on the context, `somthing` could be invoking a higher-order function, calling a method on an object that the lambda is extending, or calling a static method. The muddling of data/declarations and procedures, makes jumping into new codebases confusing and time-consuming. Rich Hickey has pointed out in numerous talks that syntax itself is complex because rules and order are complex, but in languages we generally trade a little complexity for terseness. Kotlin encourages this trade-off far too much in the direction of complexity, and the language would be almost unusable if not for it's IDE support.

Getting to the root of the previous problem is that method extensions in general feel like an anti-pattern. Without introspecting a method call, it's not possible to tell where that functionality is coming from. It's a huge amount of added cognitive strain to have a classes functionality split into various files, possibly across various code bases.

Another problem is coroutines (oh joy, another new DSL to learn). By now, it should be known, from languages like Go and Erlang, that preemptive lightweight threading is almost always going to be better than cooperative. It avoids function coloring, keeps throughput of applications more consistent, and is generally easier to reason about. Java also now has preemptive lightweight threading via virtual threads, but because much of the Kotlin ecosystem is reliant on the coroutine API, it don't get any of the ergonomic benefits.

By @latenightcoding - 5 months
> we decided that the only way to leverage the full value of Kotlin was to go all in on conversion

Could someone expand on this please.

By @can3p - 5 months
One of the things that surprised me in the article was their usage of J2K. They’ve been using it as part of IntelliJ, alright, but why did they have to run it headless? They’ve even mentioned that it was open sourced. And later they’ve said that they were not able to do much improvements because it was on maintenance mode at Jet Brains.

I mean, with the ressources meta has I’m sure they could have rewritten the tool, made a fork or done any other thing to incorporate their changes (they talk about overrides) or transformed the tool into something better fitting their approach. Maybe it has been done, just not clear from the article

By @roschdal - 5 months
Why? Oh God why?
By @mukunda_johnson - 5 months
Wait a minute, this isn't an AI article...
By @thdhhghgbhy - 5 months
>Developers prefer Kotlin over Java

Why? Bit of substance here would be nice. Otherwise it's just another "we migrated to $coolLanguage" post.

By @zahlman - 5 months
This article absolutely reeks of ChatGPT to me. For example:

>With this in mind, we set out to automate the conversion process and minimize interference with our developers’ daily work. The result was a tool we call the Kotlinator that we built around J2K. It’s now comprised of six phases:

followed by a list of descriptions of the "phases" which only sort of make sense for the name given to them, and are utterly incoherent as actual phases in a process (and grammatically inconsistent). For example, one of the cited "phases" is... "headless J2K". In other words: they have one piece of software that wraps another, and it - gasp - doesn't use the wrapped software's GUI. Aside from being entirely unremarkable, that's neither a phase in a process nor a component of a tool. It's a fact about the component.

LLMs write like this all the time - and it's clear evidence that they do not, in fact, do anything like reasoning, even if they can sometimes be manipulated into generating a second piece of text that resembles an analysis of the first one. The resulting description is so weird that I question whether the authors actually checked the LLM's output for accuracy.

Any human writer who gives a damn about good writing and has any skill, would not allow "it" to refer to "the conversion process" two sentences back when "a tool called the Kotlinator" has been introduced in the interim (or, if that were the intended referent, would notice that tools are not "comprised of phases"). Such a writer would also not come up with abominations like "the conversion process is now comprised of six phases" where "we now use a six-phase conversion process" would be much clearer. Certainly, a six-point bullet list produced by a competent writer would label them in a grammatically consistent way (https://en.wikipedia.org/wiki/Parallelism_(grammar)) - not with an abstract noun describing an action, two participles, two concrete (well, as concrete as software ever is) nouns and a command (who, exactly, is being told to "build" the "error-based fixes" - whatever that means - here?).

I'm starting to feel like Mark Twain.

----

On the other hand, I was cynically expecting some mention of using AI for the actual task, and that doesn't seem to be the case.

(Also, the "reactive" web design is broken. The page overflows horizontally for some range of window widths, without causing a horizontal scrollbar to be added.)

By @earth2mars - 5 months
Wait, no AI is used ? Amazon claims big about their code conversion from Java 8 to 17 using Q developer (GitHub Copilot equivalent). Why not use Llama3 models here? Can't they help doing such?
By @nutanc - 5 months
I am surprised they did not use LLMs like Claude or maybe even train their own Llama version to do this. In my experience LLMs have been very reliable in translating code.
By @gerdesj - 5 months
"Android development at Meta has been Kotlin-first since 2020, and developers have been saying they prefer Kotlin as a language for even longer."

Not one link to an opinion piece or two regarding: "kotlin vs java". The nearest thing I found was "What makes Kotlin different".

This sounds somewhat like a debate about which Germanic language is best. German, Dutch and English are all "Germanic" but which is "best"?

Obviously: That's the wrong question to ask and so an answer is doomed to failure.

In the end, does your generated machine code implore the CPU and associated hardware to do what you want it to more efficiently in some way that is not an abstraction?

Pissing contests rarely excite me. Why did you do this?