June 21st, 2024

Advanced text features and PDF

The post explores complex text features in PDFs, covering Unicode, glyph representation, kerning, and font challenges. It emphasizes tools like Harfbuzz and CapyPDF for accurate text handling in PDFs.

Read original articleLink Icon
Advanced text features and PDF

The blog post discusses advanced text features and PDF handling. It delves into the complexities of representing text in PDFs, including source text, Unicode codepoints, glyph ids, and ActualText. Kerning, glyph substitution, and alternate forms like ligatures and OpenType fonts are also explored. The post highlights challenges such as text selection, glyph lookup, and handling multiple glyphs for the same Unicode codepoint. It mentions the role of libraries like Harfbuzz in shaping text and the limitations of tools like Freetype in reverse glyph mapping. The post concludes by noting that PDF generator libraries, like CapyPDF, focus on providing functionality while leaving the interpretation of text sequences and metadata to client applications. The discussion showcases the intricacies involved in text representation in PDFs and the need for careful handling of text elements for proper rendering and functionality.

Related

Hypermedia Systems

Hypermedia Systems

The book "Hypermedia Systems" by Carson Gross, Adam Stepinski, and Deniz Akşimşek, with a foreword by Mike Amundsen, introduces innovative web development concepts using htmx and Hyperview. It caters to web developers, individuals interested in web basics, and companies transitioning apps to mobile platforms. Available online and on Amazon.

Font as Tetris [video]

Font as Tetris [video]

The video discusses font evolution from clay tablets to digital fonts, covering styles, typography progress, ligatures, OTF and TTF formats. It mentions Metafont, hinting techniques, and Half Bus C++ library integration.

Polytype: A Rosetta Stone for typesetting engines

Polytype: A Rosetta Stone for typesetting engines

Polytype is a project like Rosetta Code but for typesetting engines. It compares how different engines handle layout and orthographic features. Contributions are welcome via GitHub for new samples and improvements. Users can build examples locally and test the website.

Synthesizer for Thought

Synthesizer for Thought

The article delves into synthesizers evolving as tools for music creation through mathematical understanding of sound, enabling new genres. It explores interfaces for music interaction and proposes innovative language models for text analysis and concept representation, aiming to enhance creative processes.

Microfeatures I love in blogs and personal websites

Microfeatures I love in blogs and personal websites

The article explores microfeatures for blogs and websites inspired by programming concepts. It highlights sidenotes, navigation tools, progress indicators, and interactive elements to improve user experience subtly. Examples demonstrate practical implementations.

Link Icon 4 comments
By @BossingAround - 5 months
Do people actually like ligatures? I tried it with the IntelliJ font and I could just not get used to it. Maybe because I'm slightly dyslexic, but boy, it almost felt like the code I was reading was in a different language.

Ligatures might look beautiful, but my brain just says "nope, I don't know this symbol" and refuses to process it in a meaningful way.

By @tln - 5 months
> Unfortunately the way things are set up means that you can only specify horizontal kerning when laying out horizontal text and vertical kerning for vertical text. If your script requires both, you are not going to have a good time.

The Ts operator (sets text rise, ie changes the baseline) could be useful here. Text selection in PDF readers may even treat text with different text rise as being on the same line.

> We could specify kerning manually with a custom translation matrix that translates the rendering location by the amount needed. There are two main downsides to this. First of all it would mean that instead of having a stream of glyphs to render, you'd need to define 9 floating point numbers (actually 6 due to reasons) between every pair of glyphs.

Or use Td... 2 numbers.

By @mistrial9 - 5 months
congrats on the journey into print technology! as others have said, ligatures divide the audience into "no extras please, I want clear character codes for my tech work" and "produce visually pleasing type content for my excellent reading audiences"

Digital information is relatively new compared to print.. which is relatively new compared to human history. So there is some overlap in communication, language, style and accuracy challenges. Print is not dead! but not the focus so much on an online forum.

PDF the document definition is sort of a mess really, but here it is. Obviously Adobe Systems is not going to save everyone. great to see this writeup here today.

By @TRiG_Ireland - 5 months
PDF was originally designed for printers, wasn't it? So making it accessible to screen readers is tricky.