|Ned Batchelder : Blog | Code | Text | Site|
Extracting JPGs from PDFs
» Home : Blog : December 2007
I was trying to diagnose a problem with a PDF file we generated yesterday, and suspected that the images were corrupted. To see, I wrote this quick script to extract JPGs from PDF files. It is quick and dirty, with the absolute minimum understanding of PDF files, which can be quite opaque.
# Extract jpg's from pdf's. Quick and dirty.
This script works for my PDF files. Maybe it doesn't work for all, I don't know. PDF files are complex beasts. Your mileage may vary.
What I'd really like is a tool for exploring inside PDF files, so that I could see exactly what's going on in there. pyPdf is a start, but only scratches the surface of the kind of stuff I'd like to see...