July 2nd, 2024

Plaintext is not a great format for (system) logs

Using plain text for system logs poses challenges due to rich metadata. Approaches include augmenting logs with metadata, storing it separately, or discarding it. Tools can help manage metadata, making logs more structured than plain text. JSON, though text-based, may not be plain text.

Read original article

Plaintext is not a great format for (system) logs

The blog post discusses the limitations of using plain text for storing system logs, regardless of the tool being used. It highlights that log messages often come with rich metadata that can be challenging to handle in plain text format. The post outlines three approaches to dealing with metadata in logs: augmenting log messages with metadata in text format, storing metadata by implication in separate files, or discarding the metadata altogether. It points out that as systems attach more metadata to log messages, storing logs in plain text can lead to clutter, complexity, or loss of metadata. The post suggests that relying on tools to manipulate metadata for readability can result in logs becoming more structured than plain text. Ultimately, it mentions that while technically JSON is text, it may not qualify as plain text due to its structured nature.

Advanced text features and PDF

The post explores complex text features in PDFs, covering Unicode, glyph representation, kerning, and font challenges. It emphasizes tools like Harfbuzz and CapyPDF for accurate text handling in PDFs.

Why not parse `ls` (and what to do instead)

Parsing 'ls' output in Unix systems is discouraged due to challenges with special characters. Shell globs and for loops are recommended for reliable file handling, criticizing 'ls' parsing for its limitations.

Structured logs are the way to start

Structured logs are crucial for system insight, aiding search and aggregation. Despite storage challenges, prioritizing indexing and retention strategies is key. Valuable lessons can be gleaned from email for software processes.

The Eternal Truth of Markdown

Markdown, a simplified code alternative to HTML, enables diverse document formats from plain text. Despite lacking standardization, it thrives for its adaptability and simplicity, appealing to writers and programmers alike.

Writing HTML by hand is easier than debugging your static site generator

The blog author discusses challenges of static site generators versus manual HTML coding, citing setup complexities and advocating for simplicity, stability, and control in website management. Emphasizes static data benefits.

10 comments

By @crest - 10 months

When your "compressed database" is both larger and slower to query than the plaintext, has an unstable on disk format, and isn't even crash safe like journald's database I have to assume it's a sick joke and not a serious upgrade from plaintext logs on a (compressing) file system.

By @7thaccount - 10 months

You can make it all some kind of json object and use plenty of tools to parse.

I think text is really awesome though. You can read it with any editor and there are a zillion easy to learn Linux tools like grep, cut, awk...etc for interactively creating a oneliner in a short amount of time that you can use to get what you want out of the text. I used to have a bunch of those when I was in charge of some applications on a Linux server. Text is very universal and I'm not just talking about log files.

By @likeabatterycar - 10 months

Absolutely nothing wrong with binary logs as long as the system provides a utility to get the data out of that format.

For that matter, journalctl is terrible, cryptic, and difficult to use.

By @jmclnx - 10 months

The title should be "Plaintext is not a great format for (system) logs for people have no idea about regexp" :)

In my opinion, plain text wins for everything, very portable and people can read with very little knowledge.

By @exabrial - 10 months

plaintext and grep were find for your grandpappy and they're fine for you.

By @PreInternet01 - 10 months

Au contraire, plain text is just about the only format for logs that works. Having to use some obscure tool to query logs just isn't worth it, especially not in 20 years time, when you still need to analyze said logs but the tool doesn't even run on any contemporary systems anymore and the documentation about the binary format has been irretrievably lost.

And yeah, sure, you want to use structured logging. So, in addition to the greppable log message, include a Magic Separator Character (say, \t) that you treat as 'end-of-line' in human-oriented processing, and have your key-value-pair structured data following that for automated tooling to have its way with. Or, be really creative with [key=value] blocks that are both human- and machine-readable.

Some of my most painful logging experiences were having to extract application logs from binary blobs created by a proprietary Windows tool (no, not the main Windows event log: that's bad, but at least documented). Even the most recent versions of the official viewer just crashed, the vendor was not interested in fixing that (since we were not a customer, just a third party in need of log data, but even an offer of a modest payment was met with indifference...), and the actual format turned out to be byzantine beyond words.

So, yeah, give me plain text all day, every day...

By @lucianbr - 10 months

On the other hand, multiple competing binary log formats make it harder to process the metadata than plaintext.

By @fareesh - 10 months

text is great for awk, grep, etc

perhaps some llm-powered awk command generator would be useful?

By @JackSlateur - 10 months

tldr: the author confuses plain text (aka, storing in ascii files) with storing in some binary format (whatever it is). Said confusion makes the whole post worthless, in my opinion.

Plaintext is not a great format for (system) logs

Related

Advanced text features and PDF

Why not parse `ls` (and what to do instead)

Structured logs are the way to start

The Eternal Truth of Markdown

Writing HTML by hand is easier than debugging your static site generator

Related

Advanced text features and PDF

Why not parse `ls` (and what to do instead)

Structured logs are the way to start

The Eternal Truth of Markdown

Writing HTML by hand is easier than debugging your static site generator

Plaintext is not a great format for (system) logs

Related

Advanced text features and PDF

Why *not* parse `ls` (and what to do instead)

Structured logs are the way to start

The Eternal Truth of Markdown

Writing HTML by hand is easier than debugging your static site generator

Related

Advanced text features and PDF

Why *not* parse `ls` (and what to do instead)

Structured logs are the way to start

The Eternal Truth of Markdown

Writing HTML by hand is easier than debugging your static site generator

Why not parse `ls` (and what to do instead)

Why not parse `ls` (and what to do instead)