Howto Make scanned PDFs searchable (OCR) using pdfocr
What pdfocr is for
Suppose you have a PDF document that was made using a scanner, or otherwise consists of image data but doesn't have text data. Such a PDF can't be searched by PDF readers or desktop search applications. pdfocr is a simple utility I made that takes a PDF file, then generates a new one that has the text layer added, so it's searchable by your PDF reader and can be indexed by your desktop search application, but is still identical when printed.
What pdfocr is not for
This is only of use if your PDF was made from a scanned source; if you exported your PDF from OpenOffice or the like it already has a text layer so this is unnecessary.
Compatibility
This guide will work on Ubuntu Karmic (9.10) or Lucid (10.04)
Install pdfocr in Ubuntu Karmic (9.10) and Lucid (10.04)
Open the terminal and run the following commands
sudo add-apt-repository ppa:gezakovacs/pdfocr
sudo apt-get update
sudo apt-get install pdfocr
Using pdfocr to add a text layer to your scanned PDF file
Open a terminal, go to the directory that has the PDF file you want to convert, and enter (substituting input.pdf with the input PDF file, and output.pdf with the output PDF file)
pdfocr -i input.pdf -o output.pdf
Now wait as OCR is performed on the PDF file page-by-page, and the output file is generated. This should take a few seconds per page, depending on the resolution of your PDF file (high-res PDF files get better accuracy, but will take longer). Once done, you should now have a searchable PDF at output.pdf.
Credit goes here
Do you happen to have any information about where this software is from, specifically the source code?
@Tal
http://ubuntuforums.org/showthread.php?p=9136558
Software from ppas is always free as stated here: https://help.launchpad.net/PPATermsofUse
This OCR Software is impressive!
Free, and efficient, as compared to commercial software.
It lacks a GUI and easier customization options, but really, it’s amazing!
This guide has lost me: how do you open a terminal, go to the directory that has the PDF file you want to convert, and enter (substituting input.pdf with the input PDF file, and output.pdf with the output PDF file) – after all, a directory cannot be opened in a terminal? Or am I missing something here?
I added the repository and everything is installed however when I use this on my scanned notes (Words are in capitalization) it results in no search terms available. Any ideas!