Howto Make scanned PDFs searchable (OCR) using pdfocr
What pdfocr is for
Suppose you have a PDF document that was made using a scanner, or otherwise consists of image data but doesn't have text data. Such a PDF can't be searched by PDF readers or desktop search applications. pdfocr is a simple utility I made that takes a PDF file, then generates a new one that has the text layer added, so it's searchable by your PDF reader and can be indexed by your desktop search application, but is still identical when printed.
What pdfocr is not for
This is only of use if your PDF was made from a scanned source; if you exported your PDF from OpenOffice or the like it already has a text layer so this is unnecessary.
This guide will work on Ubuntu Karmic (9.10) or Lucid (10.04)
Install pdfocr in Ubuntu Karmic (9.10) and Lucid (10.04)
Open the terminal and run the following commands
sudo add-apt-repository ppa:gezakovacs/pdfocr
sudo apt-get update
sudo apt-get install pdfocr
Using pdfocr to add a text layer to your scanned PDF file
Open a terminal, go to the directory that has the PDF file you want to convert, and enter (substituting input.pdf with the input PDF file, and output.pdf with the output PDF file)
pdfocr -i input.pdf -o output.pdf
Now wait as OCR is performed on the PDF file page-by-page, and the output file is generated. This should take a few seconds per page, depending on the resolution of your PDF file (high-res PDF files get better accuracy, but will take longer). Once done, you should now have a searchable PDF at output.pdf.
Credit goes here