pdfsandwich (add OCR text layer to scanned PDF files) pdfsandwich generates "sandwich" OCR PDF files, i.e. PDF files which contain only images (no text) will be processed by optical character recognition (OCR) and the text will be added to each page invisibly "behind" the images. This makes it possible to search for text in the PDF, and copy text from the PDF. Notes: -- The man page explains this, but I'll mention it here: the "sandwich" PDF (the output) for filename.pdf will be written to filename_ocr.pdf. -- According to its man page, pdfsandwich can optionally use hocr2pdf. However, the version of hocr2pdf on SlackBuilds.org (in the exact-image package) doesn't seem to work correctly with pdfsandwich. This isn't a real problem, just don't use the -enforcehocr2pdf option with pdfsandwich. -- The PDFs created by pdfsandwich are not quite to spec. In mupdf, you may see "warning: broken xref subsection, proceeding anyway". This seems to be harmless. If you discover that it isn't harmless, you can use ghostscript to fix it, thus: $ gs -dSAFER -dNOPAUSE -sDEVICE=pdfwrite \ -sOutputFile=pdf_fixed.pdf pdf_ocr.pdf