Posts Tagged ‘Wikipedia’

Splitting a PDF

Tuesday, January 20th, 2009

A friend just called me up and asked if I knew of any means to split a multi-page pdf-file into several single-page pdf files. My immediate answer was no, as I did not know of such a tool. A requirement was that it would work under Windows as well.

My instincts told that there would probably exist a web-service which could do what he asked for, however, the problem with online services is trust. How can I know what they do with “my” files after having let them operate on them? More specifically, what if it is sensitive information which they store, without my consent or knowledge?

So web-services was probably also out of the question (although I have no idea about the level of sensitivity surrounding his document). In any case, I hit Google with the the keywords “free” “pdf” “authoring” “software”. The top search result was a Wikipedia page. Scanning the page for the functionality I wanted, I quickly zeroed in on Pdftk (the PDF Toolkit) – can merge, split, en-/decrypt, watermark/stamp and manipulate PDF files.

The Wikipedia page, redirecting to iText didn’t amount to much, but now armed with the knowledge that what seemed like at least a library existed, I could hit Google again. “pdftk”, “split”, “file”.

Opening up a host of tabs from the search results, I stopped dead in my tracks upon finding AngusJ. To quote the site “PDFTK Builder & other PDF Resources for Windows”. I smiled for a bit giving myself a mental pat on the back for my awesome Google-Fu, and then I relayed the search terms to my friend (still on the phone, mind you) and directed him to the search result.

For some reason or another, he couldn’t download the software. It cut out half-way through the download. I didn’t try it myself, but instead my mind went into “Plan B” mode. I.e. investigating whether or not I could split the pdf for him, using the command-line pdftk which, for some reason, was already installed on my machine, a fact apt-get promptly informed me of when I tried to install it.

Just then, it seems, he got a call from his boss, and splitting the pdf was no longer an issue. But as I had already started thinking about it, I simply continued my thought process and started experimenting.

Et voila:

pdftk [input_file] [action [arguments_for_action]] output [output_file]

or, more readble:

$ pdftk a_file.pdf cat 1 output page1.pdf
$ pdftk a_file.pdf cat 2 output page2.pdf
$ pdftk a_file.pdf cat 3-4 output pages3and4.pdf

in short… Awesome!

Of course, there are more uses for this toolkit, as stated on the Wikipedia page, and again on the official page, but since my friend only needed splitting, that is what I cover.

Also, for the record, when I later tried to download the AngusJ pdftk-builder, the download worked like a charm.