August 15th, 2024

Show HN: COBOL-REKT, a toolkit for analysing and reverse-engineering COBOL

Cobol REKT is a toolkit for reverse engineering legacy Cobol code, offering flowchart generation, Neo4J integration, execution tracing, and static analysis, with planned features for code detection and knowledge integration.

Read original articleLink Icon
Show HN: COBOL-REKT, a toolkit for analysing and reverse-engineering COBOL

Cobol REKT (Cobol Reverse Engineering KiT) is a toolkit aimed at reverse engineering legacy Cobol code, offering several key features. It allows users to generate flowcharts at the program or section level from the Abstract Syntax Tree (AST) in SVG or PNG formats. Additionally, it can create parse trees and control flow trees, which can be exported to JSON. The toolkit integrates code comments into the generated graphs and supports Neo4J for advanced graph analysis by injecting AST and control flow data. It also includes execution tracing capabilities through the SMOJOL interpreter and utilizes OpenAI's language models to summarize nodes. Users can build glossaries of variables and extract capability graphs from program paragraphs. The toolkit depends on various libraries, including Eclipse Che4z for Cobol grammar and Graphviz for flowchart generation. It is useful for static analysis, visualizing control flows, identifying dead code, and generating capability maps. Planned features include similar code detection and domain knowledge integration. Built with Java and Python, it requires JDK 21 for building, with detailed instructions available in the repository for usage and API access. This toolkit is particularly beneficial for developers and engineers working with legacy Cobol systems, enhancing their ability to analyze and understand complex codebases.

- Cobol REKT is designed for reverse engineering legacy Cobol code.

- Key features include flowchart generation, Neo4J integration, and execution tracing.

- The toolkit supports static analysis and visualization of control flows.

- It relies on libraries like Eclipse Che4z and Graphviz.

- Planned features include similar code detection and domain knowledge integration.

Link Icon 7 comments
By @le-mark - 5 months
There’s actually a lot of academic work around this from the 1990s; static analysis, reverse engineering, business logic extraction, re-engineering. All leading up to Y2K. There were quite a few commercial applications too. That all fizzled out after January 1, 2000 though.
By @pmarreck - 5 months
Here's a crazy idea (and possibly a job opportunity for someone?)

If someone built a tool to translate the AST generated by this into one of these newer theorem-proving dependently-typed languages (examples: Idris/Idris2 come to mind, but also the Coq/Rocq theorem prover, Agda, Lean), would it be theoretically possible to not only translate this code into a newer language but also suss out bugs and literally prove correctness? (Given how important some of this COBOL code seems to be, such as at Medicare)

I know that one of the risks of changing the language that logic and computation is written in is unexpectedly changing the behavior or introducing new bugs; wondering if this might mitigate or almost entirely prevent that

By @karmakaze - 5 months
There was a product in the 90s that ran on the PC and did static analysis on COBOL programs. I can't remember the exact name of it, something like renew-something-or-other. It had a query language where you could follow either the possible control flow or data flow from one point to others (or to a point from earlier ones).

The only thing I've used like it recently was OQL (Object Query Language) for querying the Java heap.

I remember Intellij had some static dataflow analysis and I do miss it working in RubyMine.

By @childintime - 5 months
How about compiling Cobol to machine code, and then using an LLM to decompile to <your source language of choice>?

This moves the focus from Cobol specific tools to Cobol agnostic tools.

By @stuff4ben - 5 months
Kinda surprised this isn't an IBM tool. I suspect they could make a killing consulting/watsonXing with this.
By @robin_reala - 5 months
So obviously there’s been a lot of legacy COBOL kicking around, but is this still the case? Would a new COBOL project have been started in the last 20 years? I kind of imagined that Java (or at least the JVM) has eaten its lunch.