如何解决Coursera URL Web抓取
我有python代码,可以抓取Coursera的课程详细信息,如course_title,评分,学生人数等,但我也希望获得课程链接。有人可以帮助我了解如何从Coursera获取每个课程的网址。
解决方法
我查看了coursera.org
,并且找到了也可以抓取课程URL的解决方案。
这是您要执行的操作:
- 删除所有属性为
a
=data-click-key
的search.search.click.search_card
元素。 - 列出元素列表中每个元素的
href
。
代码如下:
#Assume that you searched for python courses
base = "https://www.coursera.org"
titles = soup.find_all("h2",class_="card-title")
urls = soup.find_all("a",attrs={"data-click-key": "search.search.click.search_card"})
#Incase you need a list of URLs
url_list = [i['href'] for i in urls]
for title,url in zip(titles,urls):
print(title.text + ": " + base + url['href'])
Output:
Python for Everybody: https://www.coursera.org/specializations/python
Python 3 Programming: https://www.coursera.org/specializations/python-3-programming
IBM Data Science: https://www.coursera.org/professional-certificates/ibm-data-science
Google IT Automation with Python: https://www.coursera.org/professional-certificates/google-it-automation
Applied Data Science with Python: https://www.coursera.org/specializations/data-science-python
Programming for Everybody (Getting Started with Python): https://www.coursera.org/learn/python
Crash Course on Python: https://www.coursera.org/learn/python-crash-course
Python for Data Science and AI: https://www.coursera.org/learn/python-for-applied-data-science-ai
Introducción a la programación en Python I: Aprendiendo a programar con Python: https://www.coursera.org/learn/aprendiendo-programar-python
Python Basics: https://www.coursera.org/learn/python-basics
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。