如何解决如何使用bsoup在xml中移动标签及其内容?
我有以下xml:
</Page>
<Page ID="Page2" PHYSICAL_IMG_NR="2" HEIGHT="3300" WIDTH="2550">
<TopMargin HEIGHT="151" WIDTH="2550" VPOS="0" HPOS="0">
</TopMargin>
<LeftMargin HE<IGHT="2771" WIDTH="143" VPOS="151" HPOS="0">
</LeftMargin>
<RightMargin HEIGHT="2771" WIDTH="143" VPOS="151" HPOS="2407">
</RightMargin>
<BottomMargin HEIGHT="378" WIDTH="2550" VPOS="2922" HPOS="0">
<TextBlock>
</TextBlock>
</BottomMargin>
<PrintSpace>
</PrintSpace>
</Page>
我想将BottomMargin
标签及其内容移动到xml的底部(在此示例中,PrintSpace
标签之后)对于每个Page
标签。
我该怎么做?
我正在通过以下方式读取xml:
以open(xml,“ r”)作为文件:
content = file.readlines()
content = "".join(content)
soup = bs.BeautifulSoup(content,'lxml')
解决方法
您可以使用extract()
和insert_after()
:
group("Name")
输出:
from bs4 import BeautifulSoup
xml = """</Page>
<Page ID="Page2" PHYSICAL_IMG_NR="2" HEIGHT="3300" WIDTH="2550">
<TopMargin HEIGHT="151" WIDTH="2550" VPOS="0" HPOS="0">
</TopMargin>
<LeftMargin HE<IGHT="2771" WIDTH="143" VPOS="151" HPOS="0">
</LeftMargin>
<RightMargin HEIGHT="2771" WIDTH="143" VPOS="151" HPOS="2407">
</RightMargin>
<BottomMargin HEIGHT="378" WIDTH="2550" VPOS="2922" HPOS="0">
<TextBlock>
</TextBlock>
</BottomMargin>
<PrintSpace>
</PrintSpace>
</Page>"""
soup = BeautifulSoup(xml,"html.parser")
b_margin = soup.select_one("BottomMargin")
# Remove BottomMargin from the xml
for tag in soup.select("BottomMargin"):
tag.extract()
# add the `BottomMargin` to the `soup`
soup.printspace.insert_after(b_margin)
print(soup.prettify())
,
假设您的XML文档中有许多<page>
。不需要extract()
,只需用您的标签调用.insert_after()
:
from bs4 import BeautifulSoup
txt = '''
<Page ID="Page1" PHYSICAL_IMG_NR="2" HEIGHT="3300" WIDTH="2550">
<TopMargin HEIGHT="151" WIDTH="2550" VPOS="0" HPOS="0">
</TopMargin>
<LeftMargin HE<IGHT="2771" WIDTH="143" VPOS="151" HPOS="0">
</LeftMargin>
<RightMargin HEIGHT="2771" WIDTH="143" VPOS="151" HPOS="2407">
</RightMargin>
<BottomMargin HEIGHT="378" WIDTH="2550" VPOS="2922" HPOS="0">
<TextBlock>
</TextBlock>
</BottomMargin>
<PrintSpace>
</PrintSpace>
</Page>
<Page ID="Page2" PHYSICAL_IMG_NR="2" HEIGHT="3300" WIDTH="2550">
<TopMargin HEIGHT="151" WIDTH="2550" VPOS="0" HPOS="0">
</TopMargin>
<LeftMargin HE<IGHT="2771" WIDTH="143" VPOS="151" HPOS="0">
</LeftMargin>
<RightMargin HEIGHT="2771" WIDTH="143" VPOS="151" HPOS="2407">
</RightMargin>
<BottomMargin HEIGHT="378" WIDTH="2550" VPOS="2922" HPOS="0">
<TextBlock>
</TextBlock>
</BottomMargin>
<PrintSpace>
</PrintSpace>
</Page>'''
soup = BeautifulSoup(txt,'html.parser')
for bottom_margin in soup.select('page bottommargin'):
bottom_margin.find_parent('page').printspace.insert_after(bottom_margin)
print(soup.prettify())
打印:
<page height="3300" id="Page1" physical_img_nr="2" width="2550">
<topmargin height="151" hpos="0" vpos="0" width="2550">
</topmargin>
<leftmargin he<ight="2771" hpos="0" vpos="151" width="143">
</leftmargin>
<rightmargin height="2771" hpos="2407" vpos="151" width="143">
</rightmargin>
<printspace>
</printspace>
<bottommargin height="378" hpos="0" vpos="2922" width="2550">
<textblock>
</textblock>
</bottommargin>
</page>
<page height="3300" id="Page2" physical_img_nr="2" width="2550">
<topmargin height="151" hpos="0" vpos="0" width="2550">
</topmargin>
<leftmargin he<ight="2771" hpos="0" vpos="151" width="143">
</leftmargin>
<rightmargin height="2771" hpos="2407" vpos="151" width="143">
</rightmargin>
<printspace>
</printspace>
<bottommargin height="378" hpos="0" vpos="2922" width="2550">
<textblock>
</textblock>
</bottommargin>
</page>
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。