如何解决如何从抓取的html中获取所有项目?
我正在使用此html文件:-
<div class="wrapper">
<ul>
<li>
<a href="https://eict.iitr.ac.in/Summeronline.html">
<div class="date">
10
<span>
Aug
</span>
</div>
<div class="detail">
<span class="title1">
E&ICT Academy (Last date: Aug. 07,2020)
</span>
<span class="detailData">
Faculty Development Programme on "ICT Tools for Teaching,Learning Process and Institutes"
</span>
</div>
</a>
</li>
<li>
<a href="https://eict.iitr.ac.in/STCMLDA.html">
<div class="date">
12
<span>
Aug
</span>
</div>
<div class="detail">
<span class="title1">
E&ICT Academy (Last date: Aug. 09,2020)
</span>
<span class="detailData">
Online Summer Training Programme on "Data Analytics and Machine Learning using Python"
</span>
</div>
</a>
</li>
.......
.......
</ul>
</div>
为了抓取上述文件中的项目:-我写了一个小脚本:-
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.iitr.ac.in/")
soup = BeautifulSoup(response.content,"html.parser")
cards = soup.find_all("div",attrs={"class": "wrapper"})
# print(cards.find("span",attrs={"class": "detailData"}).text)
# print(cards.find("div",attrs={"class": "date"}).text)
# print(cards.find("li").a['href'])
for card in cards:
print("Title:- ",card.find("span",attrs={"class": "detailData"}).text)
print("Dates:- ",card.find("div",attrs={"class": "date"}).text)
print("Link:- ",card.find("li").a['href'])
并尝试从其中打印出所有title
,Dates
和Link
,但是当我遍历所有项目时,我只得到了第一部分的输出,如何以这种方式获取所有物品?
解决方法
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.iitr.ac.in/")
soup = BeautifulSoup(response.content,"html.parser")
cardswrapper = soup.find("div",attrs={"class": "wrapper"}) #print(cardswrapper.prettify())
cardsa = cardswrapper.find_all("a") #print(cardsa)
for a in cardsa:
#print(a)
print("Link:- ",a['href'])
print("Dates:-",a.div.text)
print("Title:-",a.find("span",attrs={"class": "detailData"}).text)
print("\n")
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。