|Ned Batchelder : Blog | Code | Text | Site|
Extracting JPGs from PDFs
» Home : Blog : December 2007
I was trying to diagnose a problem with a PDF file we generated yesterday, and suspected that the images were corrupted. To see, I wrote this quick script to extract JPGs from PDF files. It is quick and dirty, with the absolute minimum understanding of PDF files, which can be quite opaque.
This script works for my PDF files. Maybe it doesn't work for all, I don't know. PDF files are complex beasts. Your mileage may vary.
What I'd really like is a tool for exploring inside PDF files, so that I could see exactly what's going on in there. pyPdf is a start, but only scratches the surface of the kind of stuff I'd like to see...