With the fog creeping in, cleaning exploits from pdf’s becomes more important, and as necessity is the mother of invention, solutions appear:
What is PDFCleaner?
PDF files are dangerous. We regularly see new Adobe Acrobat PDF vulnerabilities being exploited in the wild. Adobe usually takes a while to patch these flaws, and during that time, all Acrobat users are vulnerable. PDFCleaner is designed to remove unknown exploits from PDF files. After the exploit has been removed, opening the file in an unpatched PDF reader should be safe. Note that PDFCleaner is experimental. It is probably possible to design an exploit that would survive PDFCleaner’s removal process, so please don’t rely on it for absolute security.
How Does it Work?
PDFCleaner converts your PDF file to PostScript format, and then converts it back into a PDF file. The process of interpreting the PDF file, converting it to a different format, and converting that back into PDF ensures that any PDF-specific exploits are not transferred to the new PDF file. Postscript is a file format can do everything that PDF can do, so in most cases, the resulting PDF file will look exactly the same.
The problem with this online tool is that not all can see the code inside, and even when having access to the code, in my experience, hidden code can be hard to find. And if it was clean yesterday, is it clean today? Will it clean instead of make dirty? For many a trust issue, for some others, a lot of work in terms of checking.
Apart from that, the approach does work. It’s a known approach too. Even locked pdf’s become partly accessible when converting to postscript and back.
pdf2ps [options] input.pdf [output.ps]
If you don’t specify the output file name, the name from the input file is used.
The pdf2ps man page is minimalistic, but noteworthy is the option
pdf2ps -h also gives this info.
The tool is based on Ghostscript, so you can also use other options that “gs” accepts. Fonts are converted to bitmap fonts (at a pretty high resolution by default, but configurable with the
-r option, for example
-r 300 to set the resolution to 300 dpi).
ps2pdf [options] input.[e]ps [output.pdf]
ps2pdf is a small command script that invokes ghostscript, selecting a special “output device” called pdfwrite. Read more here.
The pdf to ps conversion can take quite some time and the resulting file can be huge, putting a burden on subsequent conversion of the file to pdf. And pdf2ps can in case aggressively crop to bounding boxes.
pdftops [options] input.pdf [output.ps]
The pdftops man page explains a lot of options and
pdftops -h gives a short version.
pdftops provides additional options like
-eps to generate an .eps file,
-l to limit the page range to convert, and options to control/change the page size:
The poppler-utils package provides some more interesting tools: pdfinfo (pdf document information extractor), pdfimages (pdf image extractor), pdftohtml (pdf to html converter), pdftotext (pdf to text converter), and pdffonts (pdf font analyzer).