Diff-pdf: tool to visually compare two PDFs
The GitHub repository offers "diff-pdf," a tool for visually comparing PDF files. Users can highlight differences in an enhanced PDF or use a GUI. Precompiled versions are available for various systems, with installation instructions.
Read original articleThe GitHub repository contains "diff-pdf," a tool designed for visually comparing two PDF files. Users can input two PDF files to highlight the differences in a visually enhanced PDF or use a simple GUI for comparison. Precompiled versions are accessible for Windows, Mac (via Homebrew or Macports), Fedora, CentOS, and openSUSE, with installation commands for Chocolatey, Homebrew, and Macports included. Detailed instructions for compiling from sources on Unix-like systems and on Windows using MSYS + MinGW are provided. Whether obtaining the precompiled binaries or compiling from the source, users have multiple options to utilize this tool for PDF file comparisons.
Related
Advanced text features and PDF
The post explores complex text features in PDFs, covering Unicode, glyph representation, kerning, and font challenges. It emphasizes tools like Harfbuzz and CapyPDF for accurate text handling in PDFs.
Show HN: Pdfscale
The GitHub repository hosts "pdfScale," a Bash script using ghostscript for PDF scaling and resizing via the command line. It supports various modes, paper sizes, and installation methods. Find more details on the repository.
That Editor
The GitHub repository hosts a DOS-like editor created for video production, not ideal for general use. It reflects historical hardware and software limitations, tailored for specific vintage computing requirements.
Chr – terminal editor inspired by Turbo Pascal editor from 1997
A terminal-based text editor "chr" on GitHub mimics desktop editors' shortcuts, blending modern GUI with retro text mode. Developed with Tui Widget, it welcomes contributions. Installation involves cloning, building, and compiling.
Graham Essays: Full Collection of PG Essays in ePub, PDF and Markdowng
The GitHub repository offers 200+ essays by Paul Graham in EPUB and Markdown formats. Regularly updated, users can download the complete set and explore the current list. Instructions for downloading and contributing are provided.
Two prompts:
Build a tool where I can drag and drop on two PDF files and
it uses PDF.js to turn each of their pages into canvas
elements and then displays those pages side by side with a
third image that highlights any differences between them, if
any differences exist
rewrite that code to not use React at all
Here's the result: https://tools.simonwillison.net/compare-pdfsIt actually works quite well! Screenshot here: https://gist.github.com/simonw/9d7cbe02d448812f48070e7de13a5...
I remember evaluating this diff-pdf tool and finding that it fell short in some way, although it's been so long that I don't recall the specifics. Most of them failed to identify changes or reported false positives. I also remember being disappointed since this one was open source and could easily be scripted.
ImageMagick can do a visual PDF compare:
magick compare -density "$DENSITY" -background white "$1[0]" "$2[0]" "$TMP"
(density = 100, $1 and $2 are the filenames to compare, $TMP the output file)You need to do some work to support multiple pages, so I use this script:
https://gist.github.com/mbafford/7e6f3bef20fc220f68e467589bb...
This also uses `imgcat` to show the difference directly in the terminal.
You can also use ImageMagick get a perceptual hash difference using something like:
convert -metric phash "$1" null: "$2" -compose Difference -layers composite -format '%[fx:mean]\n' info:
I use the fact you can configure git to use custom diff tools and take advantage of this with the following in my .gitconfig: [diff "pdf"]
command = ~/bin/git-diff-pdf
And in my .gitattributes I enable the above with: *.pdf binary diff=pdf
~/bin/git-diff-pdf does a diff of the output of `pdftotext -layout` (from poppler) and also runs pdf-compare-phash.To use this custom diff with `git show`, you need to add an extra argument (`git show --ext-diff`), but it uses it automatically if running `git diff`.
https://gist.github.com/thbar/d1ce2afef68bf6089aeae8d9ddc05d...
The code contains git-stored reference PDFs, and the test suite re-generate them and assert that nothing has changed.
Helped a lot to audit visual changes, or PDF library upgrades!
Whole article is worth reading, but if you want the relevant bits search for “ I wrote a Dart script that would take a PDF of the book”.
That said, next project we want to try something more integrated with EDA tools. If anyone else has followed this path, we'd love to know.
It shows the differences in the GUI side-by-side instead of overlayed.
I used vbindiff instead.
The diff-pdf project was my inspiration but I wanted to create a version that was distributable to non-programmers.
I rely heavily on PDF comparison via PDF-XChange Editor, which is accurate for text, but often has trouble highlighting visual changes correctly.
[1]
Good to see post-cyberresilience alternatives :)
PDF diffs are really great for versioning/comparing PCB-Designs. (The only real use case I had 15 yrs back)
- Gemini at first only diff'd the text, and then when pushed it identified the items in the images and then hallucinated the differences between the versions. It could not produce an image output.
- Claude only diff'd the text and refused to believe that there images in the PDFs.
- ChatGPT attempted to write and execute python code for this, which errored out.
Related
Advanced text features and PDF
The post explores complex text features in PDFs, covering Unicode, glyph representation, kerning, and font challenges. It emphasizes tools like Harfbuzz and CapyPDF for accurate text handling in PDFs.
Show HN: Pdfscale
The GitHub repository hosts "pdfScale," a Bash script using ghostscript for PDF scaling and resizing via the command line. It supports various modes, paper sizes, and installation methods. Find more details on the repository.
That Editor
The GitHub repository hosts a DOS-like editor created for video production, not ideal for general use. It reflects historical hardware and software limitations, tailored for specific vintage computing requirements.
Chr – terminal editor inspired by Turbo Pascal editor from 1997
A terminal-based text editor "chr" on GitHub mimics desktop editors' shortcuts, blending modern GUI with retro text mode. Developed with Tui Widget, it welcomes contributions. Installation involves cloning, building, and compiling.
Graham Essays: Full Collection of PG Essays in ePub, PDF and Markdowng
The GitHub repository offers 200+ essays by Paul Graham in EPUB and Markdown formats. Regularly updated, users can download the complete set and explore the current list. Instructions for downloading and contributing are provided.