August 25th, 2024

Language Entropy

The text discusses how abstractness and entropy in language affect information density and communication efficiency, emphasizing the role of reader knowledge and word embeddings in understanding complexity.

Read original articleLink Icon
Language Entropy

The text explores the relationship between language, abstraction, and complexity in written communication, particularly in research papers. It introduces key concepts such as abstractness, entropy, and understanding. Abstractness is defined as the distance of a word's definition from tangible objects, while entropy measures randomness and uncertainty. The author posits that more abstract words lead to lower entropy, allowing for denser information in fewer words. This efficiency is likened to a form of compression, where the reader's prior knowledge fills in gaps left by the writer. The complexity of text is influenced by its total abstraction and entropy, with a higher number of abstract words indicating greater complexity. The discussion also touches on word embeddings and their role in quantifying these concepts through mathematical functions. The author emphasizes that complexity is relative to the reader's knowledge base, which also has its own entropy. Ultimately, the piece reflects on the interplay between formalism in language and the realism it seeks to convey.

- The relationship between abstractness and entropy affects the density of information in text.

- More abstract words result in lower entropy, leading to more efficient communication.

- Complexity in text is influenced by the reader's prior knowledge and familiarity with concepts.

- Word embeddings can be used to quantify abstractness and entropy mathematically.

- Understanding of language is a cumulative process, enhancing comprehension over time.

Link Icon 3 comments
By @ttpphd - 8 months
Sounds like someone is interested in some classic Shannon.

Prediction and Entropy of Printed English, Shannon 1951 https://www.princeton.edu/~wbialek/rome/refs/shannon_51.pdf

By @randomcarbloke - 8 months
I subscribe to the notion of entropy in language increasing, but in opposition to the author I believe it is the result of reduced complexity, as words are redefined and reappropriated, meaning and nuance are lost, precision is lost, language starts to aggregate and density declines.

The author conflates "abstract words" with obscurity, implying reduced understanding, but the importance of this is knowability - even if the word is abstract the concept can be discerned (although may require a dictionary and thesaurus), when we move away from diverse, abstract, or obscure words separated from the closely related synonymous we make the conveyed and intended meaning of the sentence in which it is used much harder to know, and no external reference could enlighten us.

By @curiousgibbon - 8 months
This seems like a new definition of entropy unrelated to existing notions of entropy relevant to language. I'm not sure I buy "entropy = 1/abstractness"...