There is a freeware program by the name of PDF Info (http://www.bureausoft.com/pdfinfo.exe) which lets you edit not only the aforementioned Title/Author/Subject/Keywords fields, but also the PDF Producer and Creator Application fields. It doesn’t, however, let you change the file creation and modification dates and times.
The PDF Toolkit(pdftk) claims to be that all-in-one solution. The closest thing to Adobe Acrobat for Linux.
You can download pdftk as source or as a Debian or RPM package, FreeBSD port, or Gentoo Ebuild. Binaries are available for Windows and Mac OS X too. If you decide to compile pdftk, check the build notesbefore you begin, in order to find out about any dependencies for your Linux distro or your platform.
I am lazy, I chose to install package. Installing the pdftk package on Ubuntu and Backtrack requires having "universe" enabled. Then:
$ sudo apt-get install pdftk
The following extra packages will be installed: gcj-4.4-base gcj-4.4-jre-lib libbcmail-java libbcmail-java-gcj libbcprov-java libgcj-bc libgcj-common libgcj10 libgnuinet-java libgnujaf-java libgnumail-java libitext-java libitext-java-gcj
After this operation, 66.5MB of additional disk space will be used.
Looking at the metadata
To look at the metadata that Adobe Reader does not show by default (replace 034045 with your pdf filename):
$ pdftk 034045.pdf dump_data
Altering the metadata
To alter the metadata first put the metadata in a file (replace 034045 with your pdf filename):
$ pdftk 034045.pdf dump_data output pdf-metadata
Open the pdf-metadata file and remove the data you wish scrubbed:
Save the pdf-metadata file. Now you can use that data to scrub the metadata from your file (replace 034045 with your pdf filename):
And now the somewhat less elegant scrubbing on *nix
The creation date is gone and a new modification date has appeared. And the iText gives away the use of pdftk. These infokeys can be removed with sed:
sed -i 's/iText\ 2\.1\.7\ by\ 1T3XT//;s/D:20120409144213+02'\''00'\''//' 034045-no-metadata.pdf
The '\'' are for breaking out of the single quoted string thenescaping the single quote.
PdfID0 and PdfID1 are file identifiers. They are an md5 of various info about the file so that it has a unique string to identify the doc without having to use the filename. If you want to scrub those too, use sed as above. There's no geeky tricks needed for cleaning those two from the metadata with sed, it's pretty straightforward.
#Anonymous #AntiSec #LulzSec #OWS
16 members
Description
Scrubbing pdf metadata with pdftk and sed
by Anonymiss Express
Apr 12, 2012
http://lilithlela.cyberguerrilla.org/?p=568
This blog intends to augment USP no. 3: The danger of metadata [digital footprints] (original here) for *nix distros:
The PDF Toolkit(pdftk) claims to be that all-in-one solution. The closest thing to Adobe Acrobat for Linux.
You can download pdftk as source or as a Debian or RPM package, FreeBSD port, or Gentoo Ebuild. Binaries are available for Windows and Mac OS X too. If you decide to compile pdftk, check the build notesbefore you begin, in order to find out about any dependencies for your Linux distro or your platform.
I am lazy, I chose to install package. Installing the pdftk package on Ubuntu and Backtrack requires having "universe" enabled. Then:
$ sudo apt-get install pdftk
The following extra packages will be installed: gcj-4.4-base gcj-4.4-jre-lib libbcmail-java libbcmail-java-gcj libbcprov-java libgcj-bc libgcj-common libgcj10 libgnuinet-java libgnujaf-java libgnumail-java libitext-java libitext-java-gcj
After this operation, 66.5MB of additional disk space will be used.
Looking at the metadata
To look at the metadata that Adobe Reader does not show by default (replace 034045 with your pdf filename):
$ pdftk 034045.pdf dump_data
Altering the metadata
To alter the metadata first put the metadata in a file (replace 034045 with your pdf filename):
$ pdftk 034045.pdf dump_data output pdf-metadata
Open the pdf-metadata file and remove the data you wish scrubbed:
Save the pdf-metadata file. Now you can use that data to scrub the metadata from your file (replace 034045 with your pdf filename):
$ pdftk 034045.pdf update_info pdf-metadata output 034045-no-metadata.pdf
And check the result with:
$ pdftk 034045-no-metadata.pdf dump_data
And now the somewhat less elegant scrubbing on *nix
The creation date is gone and a new modification date has appeared. And the iText gives away the use of pdftk. These infokeys can be removed with sed:
sed -i 's/iText\ 2\.1\.7\ by\ 1T3XT//;s/D:20120409144213+02'\''00'\''//' 034045-no-metadata.pdf
The '\'' are for breaking out of the single quoted string thenescaping the single quote.
PdfID0 and PdfID1 are file identifiers. They are an md5 of various info about the file so that it has a unique string to identify the doc without having to use the filename. If you want to scrub those too, use sed as above. There's no geeky tricks needed for cleaning those two from the metadata with sed, it's pretty straightforward.