July 2nd, 2024

Diff-pdf: tool to visually compare two PDFs

The GitHub repository offers "diff-pdf," a tool for visually comparing PDF files. Users can highlight differences in an enhanced PDF or use a GUI. Precompiled versions are available for various systems, with installation instructions.

Read original articleLink Icon
Diff-pdf: tool to visually compare two PDFs

The GitHub repository contains "diff-pdf," a tool designed for visually comparing two PDF files. Users can input two PDF files to highlight the differences in a visually enhanced PDF or use a simple GUI for comparison. Precompiled versions are accessible for Windows, Mac (via Homebrew or Macports), Fedora, CentOS, and openSUSE, with installation commands for Chocolatey, Homebrew, and Macports included. Detailed instructions for compiling from sources on Unix-like systems and on Windows using MSYS + MinGW are provided. Whether obtaining the precompiled binaries or compiling from the source, users have multiple options to utilize this tool for PDF file comparisons.

Link Icon 29 comments
By @simonw - 5 months
This inspired me to have Claude 3.5 Sonnet knock out a quick web page prototype for me, using PDF.js to load and render the PDFs to canvas elements and then display visual diffs between their pages.

Two prompts:

    Build a tool where I can drag and drop on two PDF files and
    it uses PDF.js to turn each of their pages into canvas
    elements and then displays those pages side by side with a
    third image that highlights any differences between them, if
    any differences exist

    rewrite that code to not use React at all
Here's the result: https://tools.simonwillison.net/compare-pdfs

It actually works quite well! Screenshot here: https://gist.github.com/simonw/9d7cbe02d448812f48070e7de13a5...

By @tomwheeler - 5 months
In a previous job, I had to validate the output of an unreliable production publishing system, so I tested dozens of PDF comparison tools available at the time. The best I found was called Delta Walker. It was proprietary commercial Mac-only software, but reasonably inexpensive, accurate, and could handle long PDFs with lots of graphics well.

I remember evaluating this diff-pdf tool and finding that it fell short in some way, although it's been so long that I don't recall the specifics. Most of them failed to identify changes or reported false positives. I also remember being disappointed since this one was open source and could easily be scripted.

By @ydant - 5 months
Related - this might be helpful to someone.

ImageMagick can do a visual PDF compare:

    magick compare -density "$DENSITY" -background white "$1[0]" "$2[0]" "$TMP"
(density = 100, $1 and $2 are the filenames to compare, $TMP the output file)

You need to do some work to support multiple pages, so I use this script:

https://gist.github.com/mbafford/7e6f3bef20fc220f68e467589bb...

This also uses `imgcat` to show the difference directly in the terminal.

You can also use ImageMagick get a perceptual hash difference using something like:

    convert -metric phash "$1" null: "$2" -compose Difference -layers composite -format '%[fx:mean]\n' info:
I use the fact you can configure git to use custom diff tools and take advantage of this with the following in my .gitconfig:

    [diff "pdf"]
        command = ~/bin/git-diff-pdf
And in my .gitattributes I enable the above with:

    *.pdf binary diff=pdf
~/bin/git-diff-pdf does a diff of the output of `pdftotext -layout` (from poppler) and also runs pdf-compare-phash.

To use this custom diff with `git show`, you need to add an extra argument (`git show --ext-diff`), but it uses it automatically if running `git diff`.

By @thibaut_barrere - 5 months
I have been using this in a CI pipeline to maintain a business-critical PDF generation (healthcare) app (started circa 2010 I think), here is the RSpec helpers I'm using:

https://gist.github.com/thbar/d1ce2afef68bf6089aeae8d9ddc05d...

The code contains git-stored reference PDFs, and the test suite re-generate them and assert that nothing has changed.

Helped a lot to audit visual changes, or PDF library upgrades!

By @poidos - 5 months
Reminds me of the tool Bob Nystrom wrote to help himself out when working on the physical edition of Crafting Interpreters: https://journal.stuffwithstuff.com/2020/04/05/crafting-craft...

Whole article is worth reading, but if you want the relevant bits search for “ I wrote a Dart script that would take a PDF of the book”.

By @jaustin - 5 months
We've been using this in the Micro:bit Educational Foundation (microbit.org) to fill a gap in hardware design tooling, and get visual diffs of our schematics and gerbers during PCB design iterations. It's kinda wild that's what we ended up doing, but if you want to be sure your radio layout didn't change at all when you're making a minor revision to a different part of the board, visual diffs are perfect.

That said, next project we want to try something more integrated with EDA tools. If anyone else has followed this path, we'd love to know.

By @mikeyinternews - 5 months
You can do this with Beyond Compare (it's not free, but not very expensive either) https://www.scootersoftware.com/
By @smartmic - 5 months
I like this tool better: https://www.qtrac.eu/diffpdf.html

It shows the differences in the GUI side-by-side instead of overlayed.

By @rawbert - 5 months
We use this tool in our team regularly for comparison of PDFs we obtain from third party services that might have changed after code-changes on our side. Big thanks to the author <3
By @canistel - 5 months
Interestingly, Github thinks the project is 46% shell, due to the fairly huge wxwin.m4.
By @deckar01 - 5 months
I wrote a pixel-based visual diffing algorithm long ago that was intended for a CI tool that finds all of the UI changes in a PR. I broke the layout of a page I didn’t even know existed as an intern at Inkling and have had this idea in my head ever since.

https://github.com/deckar01/narcis

By @crocal - 5 months
I will just chime in to mention Draftable (https://www.draftable.com/compare). It really works well. It’s not so easy to have a visually comfortable diff of two PDFs.
By @ck_one - 5 months
Can anyone recommend a method to deduplicate pdfs? The hash is often different but the content and meta data is 99.99% the same.
By @strangus - 5 months
https://10052.ai has a tool that will visually compare documents(pdfs, doc, image,etc) and cluster them together. It works amazingly well.
By @sva_ - 5 months
Coincidentally I downloaded and tried using this just a while ago. I was trying to see if it can identify an Elsevier fingerprint between two pdfs. It can't, it only compares visible things.

I used vbindiff instead.

By @akasakahakada - 5 months
Use this to compare university textbook edition 8 and 9 before buying.
By @redman25 - 5 months
I created a similar in-browser version a while back with mozilla's pdf-js. The diff rendering is all run client side.

https://www.parepdf.com

The diff-pdf project was my inspiration but I wanted to create a version that was distributable to non-programmers.

By @TacticalCoder - 5 months
This reminds me of a book author who posted here IIRC. He had a little tool allowing him to quickly compare two revisions of his book. For example too make sure typos fixed didn't t break havoc. I remember his tool would show in red what had changed on pages thumbnails.
By @atum47 - 5 months
back when I was writing my final paper I faced a similar issue, needed to de-duplicate a bunch of PDF's, so I came up with a simple solution

https://github.com/victorqribeiro/dtf

By @fwn - 5 months
I really like the overlay view and that it is not cloud based. Will try to test it at work.

I rely heavily on PDF comparison via PDF-XChange Editor, which is accurate for text, but often has trouble highlighting visual changes correctly.

By @riedel - 5 months
I always used DiffPDF only to read on their website: > in the view of the EU’s Cyber Resilience Act and an abundance of caution, we have withdrawn all our free software

[1]

Good to see post-cyberresilience alternatives :)

PDF diffs are really great for versioning/comparing PCB-Designs. (The only real use case I had 15 yrs back)

[1] http://www.qtrac.eu/diffpdf-foss.html

By @mycall - 5 months
Of course, Adobe Compare does this too.

https://www.adobe.com/acrobat/features/compare-pdfs.html

By @npack - 5 months
https://onlinetextcompare.com/pdf lets you compare text between two pdf files locally within the browser
By @jgalt212 - 5 months
Thanks. I'll give this a shot to see if any counterparties try to sneak in any last second changes to the executable version of the doc.
By @asah - 5 months
Crazy, I'd have thought that modern multi-modal LLMs can do this, but when I tried Gemini, ChatGPT-4o and Claude they all pooped out:

- Gemini at first only diff'd the text, and then when pushed it identified the items in the images and then hallucinated the differences between the versions. It could not produce an image output.

- Claude only diff'd the text and refused to believe that there images in the PDFs.

- ChatGPT attempted to write and execute python code for this, which errored out.

By @downboots - 5 months
Maybe this could be used to generate PDFs using LaTeX and use the diff as a distance metric to optimize.
By @Levitating - 5 months
No screenshots?