如何解决中途拆分PDF页面并重新组合?
我现在没有足够的声誉分数来回答我发现的一个问题 - 如何使用 python 将 pdf 页面分成两半并重新组合以进行进一步处理..
#!/usr/bin/env python
'''
Chops each page in half,e.g. if a source were
created in booklet form,you could extract individual
pages,and re-combines it
'''
from PyPDF2 import PdfFileWriter,PdfFileReader,PdfFileMerger
#split left
with open("docu.pdf","rb") as in_f:
input1 = PdfFileReader(in_f)
output = PdfFileWriter()
numPages = input1.getNumPages()
for i in range(numPages):
page = input1.getPage(i)
page.cropBox.lowerLeft = (60,50)
page.cropBox.upperRight = (305,700)
output.addPage(page)
with open("left.pdf","wb") as out_f:
output.write(out_f)
#split right
with open("docu.pdf","rb") as in_f:
input1 = PdfFileReader(in_f)
output = PdfFileWriter()
numPages = input1.getNumPages()
for i in range(numPages):
page = input1.getPage(i)
page.cropBox.lowerLeft = (300,50)
page.cropBox.upperRight = (540,700)
output.addPage(page)
with open("right.pdf","wb") as out_f:
output.write(out_f)
#combine splitted files
input1 = PdfFileReader(open("left.pdf","rb"))
input2 = PdfFileReader(open("right.pdf","rb"))
output = PdfFileWriter()
numPages = input1.getNumPages()
for i in range(numPages):
l = input1.getPage(i)
output.addPage(l)
r = input2.getPage(i)
output.addPage(r)
with open("out.pdf","wb") as out_f:
output.write(out_f)
注意:裁剪参数特定于您的PDF,因此,请在执行程序前检查。
进一步:现在,您可以使用此文档轻松提取文本,而无需将各列合并 - 将提取弄得一团糟..
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。