如何解决如何解析多个子组的 XML 嵌套值
我有一个很大的 .xml
文件,其中的一部分如下所示:
<?xml version="1.0"?>
<data>
<measData>
<Mesurment Id="55">
<granPeriod duration="1" endTime="2021-01-02"/>
<repPeriod duration="1"/>
<measTypes>73 74 574 75 35 36 </measTypes>
<measValue measObj="Group1">
<measResults>512 52.733 33.5 82 0 0 </measResults>
</measValue>
<measValue measObj="Group2">
<measResults>512 78.175 50 119.5 0 0 </measResults>
</measValue>
</Mesurment>
</measData>
</data>
我正在尝试从中解析所需的数据并将其转换为 CSV 文件。
我遇到的问题是,在 xml 文件 <measTypes>
中重复了一次,后面提到了 group1 和 Group2 的 <measTypes>
值。
对于不同的 <Mesurment Id>
,它会有所不同,并且可能会为每个 <measTypes>
报告超过 10 个组值
问题是我不知道如何为一个 measResults
报告多个 measTypes
我有以下代码来获取值:
import xml.etree.ElementTree as ET
import pandas as pd
parsDict = dict()
tree = ET.parse('new.xml')
root = tree.getroot()
for itm in tree.iter():
if (itm.tag.split('}')[-1] == 'Mesurment'):
parsDict['Mesurment'] = [itm.attrib['Id']]
if (itm.tag.split('}')[-1] == 'granPeriod'):
parsDict['duration'] = [itm.attrib['duration']]
parsDict['endTime'] = [itm.attrib['endTime']]
if (itm.tag.split('}')[-1] == 'measTypes'):
parsDict['CounterID'] = [itm.text]
if (itm.tag.split('}')[-1] == 'measValue'):
parsDict['measObj'] = [itm.attrib['measObj']]
if (itm.tag.split('}')[-1] == 'measResults'):
parsDict['value'] = [itm.text]
df2 = pd.DataFrame(parsDict)
df2.to_csv('123.csv',index=False)
print('finish')
结果如下
这是报告最新组 我想要的结果如下所示,需要能够扩展到多个组和测量 ID
解决方法
使用 BeautifulSoup
库可能更容易做到这一点。在使用它之前,您应该安装这些依赖项:
beautifulsoup4 = "4.9.3"
lxml = "^4.6.1"
from bs4 import BeautifulSoup,Tag
soup = BeautifulSoup("""
<?xml version="1.0"?>
<data>
<measData>
<Mesurment Id="55">
<granPeriod duration="1" endTime="2021-01-02"/>
<repPeriod duration="1"/>
<measTypes>73 74 574 75 35 36 </measTypes>
<measValue measObj="Group1">
<measResults>512 52.733 33.5 82 0 0 </measResults>
</measValue>
<measValue measObj="Group2">
<measResults>512 78.175 50 119.5 0 0 </measResults>
</measValue>
</Mesurment>
</measData>
</data>
""",features="xml")
response = []
for tag in soup.data.measData:
if not isinstance(tag,Tag):
continue
# please,update this dict with all the top level attributes you need
data = {"duration": tag.granPeriod.attrs["duration"],}
for measValue in tag:
if not isinstance(measValue,Tag) or getattr(measValue,"measResults") is None:
continue
response.append({
**data,"measObj": measValue.attrs["measObj"],"value": measValue.measResults.text
})
print(response)
更新
使用你做的库,可以这样做:
import xml.etree.ElementTree as ET
import pandas as pd
tree = ET.parse('new.xml')
root = tree.getroot()
response = []
for mesurment in tree.iter("Mesurment"):
granPeriod = next(
it for it in mesurment if it.tag == "granPeriod"
)
measTypes = next(
it for it in mesurment if it.tag == "measTypes"
)
measValues = [it for it in mesurment if it.tag == "measValue"]
mesurment_data = {
"Mesurment": mesurment.attrib["Id"],"duration": granPeriod.attrib["duration"],"endTime": granPeriod.attrib["endTime"],"CounterId": measTypes.text,}
for value in measValues:
response.append({
**mesurment_data,"measObj": value.attrib["measObj"],"value": next(
it.text for it in value if it.tag == "measResults"
)
})
df2 = pd.DataFrame(response)
print(df2)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。