如何将XML数据从链接中检索到的数据转换为类似对象的字节？

如何解决如何将XML数据从链接中检索到的数据转换为类似对象的字节？

我是StackOverflow的新手，最近开始使用python进行网页抓取。正如问题所述，我无法将从链接中检索到的XML数据转换为类似对象的字节。

我想我已经正确检索了XML数据（图1）。但是，每当我尝试将其转换为树时，都会发生错误，并说“需要一个类似字节的对象” （图2）

代码：

import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup
from urllib.request import urlopen
import ssl


# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urlopen(url,context=ctx).read()
soup = BeautifulSoup(html,"html.parser")
print(soup)

#tree = ET.fromstring(soup)
#print(tree)
#for i in soup:
#  print('Name:',tree.find('name').text)
#  print('Attr:',tree.find('comments').text)

输入链接： http://py4e-data.dr-chuck.net/comments_42.xml

图片-1 Code is running fine after retrieving the data from the link

图片-2 Error is occuring

解决方法

您必须传递url本身的内容。而且，实际上，您实际上并不需要bs4，但您可能希望研究xmltodict模块。

尝试一下：

import xmltodict
import xml.etree.ElementTree as ET
from urllib.request import urlopen


html = urlopen("http://py4e-data.dr-chuck.net/comments_42.xml").read()
tree = ET.fromstring(html)
print(xmltodict.parse(html)['commentinfo']['note'])

输出：

This file contains the sample data for testing

编辑：根据您的方法，这是如何遍历树元素

import xml.etree.ElementTree as ET
from urllib.request import urlopen


html = urlopen("http://py4e-data.dr-chuck.net/comments_42.xml").read()

tree = ET.fromstring(html)

for item in tree.iter():
    if item.tag == "name":
        print(f"{item.tag}: {item.text.strip()}")

fromstring()方法将Element转换为ElementTree，并允许我们使用iter()方法遍历所有节点。

这将输出：

name: Romina
name: Laurie
name: Bayli
name: Siyona
name: Taisha
name: Alanda
...

您还可以通过调用findall()方法来访问特定元素。例如，要获取所有名称，请执行以下操作：

names = tree.findall(".//name")
print([name.text for name in names])

输出：

['Romina','Laurie','Bayli','Siyona','Taisha','Alanda','Ameelia'...]

如何将XML数据从链接中检索到的数据转换为类似对象的字节？

如何解决如何将XML数据从链接中检索到的数据转换为类似对象的字节？

解决方法

相关推荐