如何解决从网站遮盖数据python的数据
我正在尝试从单个网址中抓取单个击球数据,这是一个示例(https://baseballsavant.mlb.com/savant-player/willson-contreras-575929?stats=gamelogs-r-hitting-statcast&season=2020)
似乎隐藏了数据,或者我无法使用
driver = webdriver.Chrome('/Users/gru/Documents/chromedriver')
driver.get('https://baseballsavant.mlb.com/savant-player/willson-contreras-575929?stats=gamelogs-r-hitting-statcast&season=2020')
html_page = driver.page_source
time.sleep(15)
soup = BeautifulSoup(html_page,'lxml')
for j in soup.find_all('tr'):
drounders=[]
for h in j.find_all('td'):
drounders.append(h.get_text())
print(drounders)
这是预期的前几行
比赛日期蝙蝠队Fld队投手结果EV(MPH)LA(°)距离(ft)方向螺距(MPH)螺距类型
1 2020-08-12 Carrasco,Carlos三振出局
2020年12月2日,卡拉斯科,卡洛斯三振出局
3 2020-08-12 Carrasco,Carlos force_out相反
4 2020-08-11艾伦,洛根force_out 107.8 -25 5拉94.0 4缝快速球
5 2020-08-11艾伦,洛根三振出手77.3曲线球
6 2020-08-11 Hill,Cam sac_fly 100.5 42345 Straightaway 91.6 4-Seam Fastball
解决方法
我在这里看到的唯一问题是 Bat Team 列,因为该列包含图片而不是文字,在此答案中,我从 Bat Team 列中抓取了图像链接,我在最后位置添加的那一列,如果要忽略,则从img
删除for loop
代码:
from selenium import webdriver
from bs4 import BeautifulSoup
import time
site = 'https://baseballsavant.mlb.com/savant-player/willson-contreras-575929?stats=gamelogs-r-hitting-statcast&season=2020'
finalData = []
driver = webdriver.Chrome(executable_path = 'chromedriver.exe') # Here I am using Chrome's web driver
#For Firefox Web driver
#driver = webdriver.Firefox(executable_path = 'geckodriver.exe')
driver.get(site)
time.sleep(10)
soup = BeautifulSoup(driver.page_source,'html.parser')
tables = soup.find("div",id = "gamelogs_statcast")
trs = table.find_all("tr")
for trValue in trs:
data = []
txt = str(trValue.text)
img =str(trValue.find("img"))
data = txt + img
finalData.append(data)
print(finalData)
输出:
['Game DateBat TeamFld TeamPitcherResultEV (MPH)LA (°)Dist (ft)DirectionPitch (MPH)Pitch TypeNone','1 2020-08-13 Burnes,Corbin field_out 104.1 24 400 Straightaway 95.7 4-Seam Fastball <img class="table-team-logo" src="https://www.mlbstatic.com/team-logos/112.svg"/>','2 2020-08-13 Burnes,Corbin walk 89.2 Slider <img class="table-team-logo" src="https://www.mlbstatic.com/team-logos/112.svg"/>','3 2020-08-13 Anderson,Brett hit_by_pitch 89.5 4-Seam Fastball <img class="table-team-logo" src="https://www.mlbstatic.com/team-logos/112.svg"/>' ........]
希望这会有所帮助,并让我知道此答案是否需要其他帮助。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。