如何解决从复杂的 XML 中获得正确的结果
我想在每个账单中获得正确的输出userID、itemID和相关的余额并导出结果
我收到以下代码重复的 itemID/userID:
每个“用户”可以有许多“项目”,每个项目都有一个余额。每个项目的用户 ID 可以重复
userid = node.findtext('./userID')
itemids = node.findall('./bill/item/itemID')
bills = node.findall(".//bill/balance")
for item in itemids:
for bill in bills:
print(userid,item.text,bill.text)
这里是 XML 的例子
<user>
<userID>10269</userID>
<name>
<displayName>SAFIYA NASSER ABDULLAH AL SIYABI</displayName>
<firstName>SAFIYA</firstName>
<middleName>NASSER ABDULLAH</middleName>
<lastName>AL SIYABI</lastName>
</name>
<library>MAIN</library>
<numberOfBills>3</numberOfBills>
<bill>
<item>
<callNumber>BP173.4 .B57 2003</callNumber>
<copyNumber>1</copyNumber>
<itemID>423999</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">1.20</amount>
<reason>OVERDUE</reason>
<balance currency="OR">1.20</balance>
<library>MAIN</library>
</bill>
<bill>
<item>
<callNumber>BP173.3 .G423 2004</callNumber>
<copyNumber>2</copyNumber>
<itemID>429053</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">1.20</amount>
<reason>OVERDUE</reason>
<balance currency="OR">1.20</balance>
<library>MAIN</library>
</bill>
<bill>
<item>
<callNumber>BP173.3 .N34 2003</callNumber>
<copyNumber>1</copyNumber>
<itemID>423991</itemID>
<library>MAIN</library>
<dateCreated>2009-02-15</dateCreated>
<isPermanent>true</isPermanent>
</item>
<amount currency="OR">24.00</amount>
<reason>OVERDUE</reason>
<balance currency="OR">24.00</balance>
<library>MAIN</library>
</bill>
</user>
提前致谢
解决方法
您正在迭代每个 <item>
,然后对于每个项目,您从头开始迭代每个 <bill>
。您基本上使用 node.findall('.//itemID')
的长度作为迭代所有账单标签的次数,这不是您想要的。
迭代每个账单,然后在嵌套的 for 循环中迭代在该特定账单下找到的项目,而不是文档中的每个项目。
for bill in node.findall('bill'):
balance = bill.find('balance')
for item in bill.findall('item'):
itemID = item.find('itemID')
,
考虑使用列表/字典理解来提取选定的 XML 数据:
import xml.etree.ElementTree as et
doc = et.parse("Input.xml")
user_bill_list_of_dict = [{'userID': doc.findtext('userID'),'itemID': b.find('item').findtext('itemID'),'balance': b.findtext('balance')
} for b in doc.findall('bill')]
print(user_bill_list_of_dict)
# [{'userID': '10269','itemID': '423999','balance': '1.20'},# {'userID': '10269','itemID': '429053','itemID': '423991','balance': '24.00'}]
您甚至可以使用 dictionary merging(可用 Python 3.5+)扩展所有 XML 数据:
data = [{**{'userID': doc.findtext('userID')},**{n.tag:n.text for n in doc.findall('./name/*')},**{i.tag:i.text for i in bill.findall('item/*')},**{b.tag:b.text for b in bill.findall('*') if b.tag != 'item'},} for bill in doc.findall('bill')]
print(data)
# [{'userID': '10269','displayName': 'SAFIYA NASSER ABDULLAH AL SIYABI',# 'firstName': 'SAFIYA','middleName': 'NASSER ABDULLAH','lastName': 'AL SIYABI',# 'callNumber': 'BP173.4 .B57 2003','copyNumber': '1',# 'library': 'MAIN','dateCreated': '2009-02-15','isPermanent': 'true','amount': '1.20',# 'reason': 'OVERDUE',# 'callNumber': 'BP173.3 .G423 2004','copyNumber': '2',# 'reason': 'OVERDUE',# {'userID': '10269',# 'firstName': 'SAFIYA',# 'callNumber': 'BP173.3 .N34 2003',# 'library': 'MAIN','amount': '24.00',# 'reason': 'OVERDUE','balance': '24.00'}]
更重要的是,以上数据可以迁移到 Pandas 数据框:
import pandas as pd
...
df = pd.DataFrame(data)
# userID displayName firstName middleName lastName callNumber copyNumber itemID library dateCreated isPermanent amount reason balance
# 0 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.4 .B57 2003 1 423999 MAIN 2009-02-15 true 1.20 OVERDUE 1.20
# 1 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.3 .G423 2004 2 429053 MAIN 2009-02-15 true 1.20 OVERDUE 1.20
# 2 10269 SAFIYA NASSER ABDULLAH AL SIYABI SAFIYA NASSER ABDULLAH AL SIYABI BP173.3 .N34 2003 1 423991 MAIN 2009-02-15 true 24.00 OVERDUE 24.00
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。