OCRmyPDF - 编程之家

OCRmyPDF 介绍

OCRmyPDF 为 PDF 文件增加了 OCR 文本层，使之可以被方便的检索。

使用方法：

ocrmypdf                      # it's a scriptable command line program
   -l eng+fra                 # it supports multiple languages
   --rotate-pages             # it can fix pages that are misrotated
   --deskew                   # it can deskew crooked PDFs!
   --title "My PDF"           # it can change output Metadata
   --jobs 4                   # it uses multiple cores by default
   --output-type pdfa         # it produces PDF/A by default
   input_scanned.pdf          # takes PDF input (or images)
   output_searchable.pdf      # produces validated PDF output

主要特性：

Generates a searchable PDF/A file from a regular PDF
Places OCR text accurately below the image to ease copy / paste
Keeps the exact resolution of the original embedded images
When possible, inserts OCR @R_652_4045@ion as a “lossless” operation without rendering vector @R_652_4045@ion
Keeps file size about the same
If requested deskews and/or cleans the image before performing OCR
Validates input and output files
Provides debug mode to enable easy verification of the OCR results
Processes pages in parallel when more than one cpu core is available
Uses Tesseract OCR engine
Supports more than 100 languages recognized by Tesseract
Battle-tested on thousands of PDFs, a test suite and continuous integration

OCRmyPDF 官网

https://github.com/jbarlow83/OCRmyPDF

OCRmyPDF 为 PDF 文档增加文本层

OCRmyPDF 介绍

OCRmyPDF 官网

相关推荐