This is a command line tool that converts scanned PDF documents into Word 2007 formatted documents.
VeryPDF Scanned PDF to DOCX OCR Converter is a Command Line application. Optical Character Recognition technology is used to convert scanned PDF documents to editable DOCX files. There is no need for the Adobe Acrobat document to be present. The image formats the application can handle include the TIFF, BMP, PNG, JPG, PCX and TGA, etc. The application will let you select a range of pages, the whole document or even a single page. While the default tool supports the English character set, you could optionally choose up to 10 other language sets that use variations in the English character set. You could integrate the command line tool into an application you are developing.
Layout of source documents maintained so that no extensive editing is needed following the conversion. The quality of the OCR conversion process will largely depend on the quality of the scanned image and the clarity of the characters of that image. In fact the font used also contributes to the recognition capability. There are no features available in the tool that lets you clean up the input image pages. Even a simple feature like deskew would have been immensely helpful. You should consider these issues when building your application that includes this particular tool. Depending on the accuracy, some characters will have to be edited. This will go up with inaccuracies in the recognition. The size of the document also will multiply the edit time for a given inaccuracy level.