如何解决如何使用beautifulsoup仅提取“ https”包含的链接?
import requests
from bs4 import BeautifulSoup
page = requests.get("https://evaly.com.bd/")
soup = BeautifulSoup(page.content,'html.parser')
for link in soup.find_all('a',href=True):
print (link['href'])
代码结果:
仅需要包含https的链接,而无需在图像中标记矩形框。
解决方法
您可以将.select
方法与CSS选择器一起使用:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://evaly.com.bd/")
soup = BeautifulSoup(page.content,'html.parser')
for link in soup.select('a[href^="https://"]'):
print (link['href'])
打印:
https://merchant.evaly.com.bd/
https://www.facebook.com/groups/EvalyHelpDesk/
https://play.google.com/store/apps/details?id=bd.com.evaly.ebazar
https://evaly.com.bd/
https://evaly.com.bd/hot-deal
https://evaly.com.bd/premium-deal
https://evaly.com.bd/hot-deal
https://evaly.com.bd/premium-deal
https://evaly.com.bd/hot-deal
https://evaly.com.bd/campaign/shop/samsung-note-20-for-hot-deal/samsung-note20-for-hot-deal-058bbc
https://evaly.com.bd/premium-deal
https://evaly.com.bd/campaign/shop/rancon-motors-for-mega-deal-pod/rancon-motors-for-mega-deal-pod-be211b
https://evaly.com.bd/premium-deal
https://play.google.com/store/apps/details?id=bd.com.evaly.ebazar
https://evaly.com.bd/
https://play.google.com/store/apps/details?id=bd.com.evaly.evalyshop
https://apps.apple.com/app/id1504042677
https://www.facebook.com/evaly.com.bd/
https://www.instagram.com/evaly.com.bd/
https://www.youtube.com/channel/UCYxO44JS4_6CLXFKVmZJ7Vg
,
使用正则表达式实现此目标的另一种方法
htmlInfoContent
输出:
import requests,re
from bs4 import BeautifulSoup
res = requests.get("https://evaly.com.bd/")
soup = BeautifulSoup(res.content,'html.parser')
for a in soup.find_all("a",href = re.compile("^https://*")):
print(a["href"])
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。