Python 元素树和 Word 文档和表格

如何解决Python 元素树和 Word 文档和表格

我使用 Python 和 ElementTree 来处理用 Word 创建的报告。我需要在报告的内容和各种表格中找到的信息。

在大多数情况下，一切都很好......只是一个小问题。

当我到达一个表格时，我为表格、行和单元格添加HTML标签，并将其作为一个字符串返回。这对我来说是完美的。

但是，在处理完表格之后，它会继续获取表格中里面所有内容的文本。我不想那样，我已经从表格中获得了我需要的内容。

如何告诉 ElementTree 从表的末尾开始处理而不是继续处理整个表？

问题示例：

这是我得到的

"Table A"
"<table><tr><td>CELL IN TABLE A</td></tr><table>"
"CELL IN TABLE A"  <-- THIS IS THE PROBLEM
"TABLE B"
"<table><tr><td>CELL IN TABLE B</td></tr><table>"
"CELL IN TABLE B"  <-- THIS IS THE PROBLEM

这就是我想要的

"Table A"
"<table><tr><td>CELL IN TABLE A</td></tr><table>"
"TABLE B"
"<table><tr><td>CELL IN TABLE B</td></tr><table>"

代码

def __get_content(self,docx,file):
    content = []
    tree = ET.XML(docx.read(file))

    for elem in tree.iter():
        if(elem.tag == self.__PARA):
            result = self.__process_paragraph(elem)                
            if(result):
                content.append(result)
        elif(elem.tag == self.__TABLE):
            result = self.__process_table(elem)
            if(result):
                content.append(result)
    return content

def __process_paragraph(self,tree,join_text=''):
    paragraphs = []
    for paragraph in tree.iter(self.__PARA):
        texts = [node.text
                 for node in paragraph.iter(self.__TEXT)
                 if node.text]
        if texts:                
            paragraphs.append(''.join(texts))
    return join_text.join(paragraphs)

def __process_table(self,tree):
    content = []
    for table in tree.iter(self.__TABLE):
        content.append("<table>")
        for row in table.iter(self.__ROW):
            content.append("<tr>")
            for cell in row.iter(self.__CELL):
                content.append("<td>")
                cell_content = self.__process_paragraph(cell,'</br>')
                if(cell_content.strip()):
                    content.append(cell_content)
                content.append("</td>")
            content.append("</tr>")
        content.append("</table>")
    return ''.join(content)

先谢谢你！

Python 元素树和 Word 文档和表格

如何解决Python 元素树和 Word 文档和表格

相关推荐