如何解决如何使用BeautifulSoup在html注释标签中提取json?
我想使用BeautifulSoup在html注释标签中提取json
内容。
<script data_id ="dfsfre2323" data_key="23424sfsfsfdafd",type="application/json"><!--
{"employee": {"name":"sonoo","salary":56000,"married":true}}--></script>]
输出应如下
Name: sonoo
Salary: 56000
Married: True
我尝试了以下操作:
from bs4 import BeautifulSoup,Comment
import json
soup = BeautifulSoup(webpage,"html.parser")
data = soup.find("script",{"type":"application/json",data_id ="dfsfre2323" data_key="23424sfsfsfdafd"})
comment = soup.find(text=lambda text:isinstance(data,Comment))
我在评论中什么都没得到。
任何帮助事前感谢吗?
解决方法
BeautifulSoup无法解析<script>
标记内的内容,因此您的.find(text=...)
找不到任何内容。在.find()
之前将脚本字符串转换为BeautifulSoup:
import json
from bs4 import BeautifulSoup,Comment
txt = '''
<script data_id ="dfsfre2323" data_key="23424sfsfsfdafd" type="application/json"><!--
{"employee": {"name":"sonoo","salary":56000,"married":true}}
--></script>'''
soup = BeautifulSoup(txt,"html.parser")
data = soup.find("script",{"type":"application/json",'data_id':"dfsfre2323",'data_key':"23424sfsfsfdafd"})
comment = BeautifulSoup(data.string,"html.parser").find(text=lambda t: isinstance(t,Comment))
data = json.loads(comment)
print(json.dumps(data,indent=4))
打印:
{
"employee": {
"name": "sonoo","salary": 56000,"married": true
}
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。