free hit counter

#Anonymous #AntiSec #LulzSec #OWS

http://legionnairesawaken.sexyi.am/LegionNET/wp-content/uploads/2012/02/320x-2.jpg

Scrubbing pdf metadata with pdftk and sed

http://lilithlela.cyberguerrilla.org/?p=568

This blog intends to augment USP no. 3: The danger of metadata [digital footprints] (original here) for *nix distros:

There is a freeware program by the name of PDF Info (http://www.bureausoft.com/pdfinfo.exe) which lets you edit not only the aforementioned Title/Author/Subject/Keywords fields, but also the PDF Producer and Creator Application fields. It doesn’t, however, let you change the file creation and modification dates and times.

The PDF Toolkit(pdftk) claims to be that all-in-one solution. The closest thing to Adobe Acrobat for Linux.

You can download pdftk as source or as a Debian or RPM package, FreeBSD port, or Gentoo Ebuild. Binaries are available for Windows and Mac OS X too. If you decide to compile pdftk, check the build notesbefore you begin, in order to find out about any dependencies for your Linux distro or your platform.

I am lazy, I chose to install package. Installing the pdftk package on Ubuntu and Backtrack requires having "universe" enabled. Then:

$ sudo apt-get install pdftk

 

The following extra packages will be installed: gcj-4.4-base gcj-4.4-jre-lib libbcmail-java libbcmail-java-gcj libbcprov-java libgcj-bc libgcj-common libgcj10 libgnuinet-java libgnujaf-java libgnumail-java libitext-java libitext-java-gcj

After this operation, 66.5MB of additional disk space will be used.

Looking at the metadata

To look at the metadata that Adobe Reader does not show by default (replace 034045 with your pdf filename):

$ pdftk 034045.pdf dump_data

 

Altering the metadata

To alter the metadata first put the metadata in a file (replace 034045 with your pdf filename):

$ pdftk 034045.pdf dump_data output pdf-metadata

Open the pdf-metadata file and remove the data you wish scrubbed:

Save the pdf-metadata file. Now you can use that data to scrub the metadata from your file (replace 034045 with your pdf filename):

$ pdftk 034045.pdf update_info pdf-metadata output 034045-no-metadata.pdf

And check the result with:

$ pdftk 034045-no-metadata.pdf dump_data

 

And now the somewhat less elegant scrubbing on *nix

The creation date is gone and a new modification date has appeared. And the iText gives away the use of pdftk. These infokeys can be removed with sed:

sed -i 's/iText\ 2\.1\.7\ by\ 1T3XT//;s/D:20120409144213+02'\''00'\''//' 034045-no-metadata.pdf

 

The '\'' are for breaking out of the single quoted string thenescaping the single quote.

PdfID0 and PdfID1 are file identifiers. They are an md5 of various info about the file so that it has a unique string to identify the doc without having to use the filename. If you want to scrub those too, use sed as above. There's no geeky tricks needed for cleaning those two from the metadata with sed, it's pretty straightforward.