如何解决网页搜刮时无法获取所有数据
我正在尝试通过网络抓取此URL = https://www.ventanillaunicaenfermeria.es/BuscarColegiados.php。 我需要收集“ N°cole”的值。列和“ Nombre Colegiado”列。
我正在使用BeautifulSoup,但仅获得“ N°cole”值。柱。我该如何解决?
谢谢!
这是我的代码:
from requests import get
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
page = requests.get('https://www.ventanillaunicaenfermeria.es/BuscarColegiados.php')
soup = BeautifulSoup(page.text,'html.parser')
data = soup.find_all("span",{'class':'colColegiado'})
numero_col = []
for i in data:
data_num = i.text.strip()
numero_col.append(data_num)
numero_col
['Nº cole.','6478','13107','7341','12110','5625','4877','4700','9126','8444','13120','5023','12235','7747','17701','17391','17944','17772','7230','11729','17275']
解决方法
您当前正在从错误的html元素中获取值-它应该来自具有<p>
类的所有resalto
中的值。
import requests
from bs4 import BeautifulSoup
#import pandas as pd
#import numpy as np
page = requests.get('https://www.ventanillaunicaenfermeria.es/BuscarColegiados.php')
soup = BeautifulSoup(page.text,'html.parser')
data = soup.find_all("p",{'class':'resalto'})
schools = []
for result in data:
data_num = result.contents[0].text.strip()
#numero_col.append(data_num)
data_name = str(result.contents[1])
schools.append((data_num,data_name))
print(schools)
,
您不能一次选择所有p
,而只能遍历表中的段落。以下代码采用页码并将表保存到csv文件中。
import requests
from bs4 import BeautifulSoup
import pandas as pd
pageno = 1
res = requests.get(f'https://www.ventanillaunicaenfermeria.es/BuscarColegiados.php?nombre=&ap=&colegio=&col=&nif=&pagina={pageno}')
soup = BeautifulSoup(res.text,"html.parser")
header = soup.find("div",{"id":"contactaForm"}).find("h4")
cols = [header.find("span").get_text(),header.get_text().replace(header.find("span").get_text(),"")]
data = []
for p in soup.find("div",{"id":"contactaForm"}).find_all("p"):
if len(p['class']) == 0 or p['class'][0] == "resalto":
child = list(p.children)
data.append([child[0].get_text(strip=True),child[1]])
df = pd.DataFrame(data,columns=cols)
df.to_csv("data.csv",index=False)
print(df)
输出:
Nº cole. Nombre colegiado
0 6478 GUADALUPE LAZARO LAZARO
1 13107 JOSE MARIA PIÑA MANZANO
2 7341 HEIKE ELFRIEDE BIRKHOLZ
3 12110 ESTHER TIZON ROLDAN
4 5625 MARIA DOLORES TOMAS GARCIA-VAQUERO
5 4877 MARIA CARMEN CASADO LLAVONA
6 4700 MANUEL GUILABERT ORTEGA-VILLAIZAN
7 9126 MARIA ESPERANZA ASENSIO ALMAZAN
8 8444 CONCEPCION VIALARD RODRIGUEZ
9 13120 NURIA VILLAESCUSA SANCHEZ
10 5023 ARTURO BONET BLANCO
11 12235 ALFONSO JIMENEZ LOPEZ
12 7747 JACOBUS PETRUS SINNIGE
13 17701 ANIA BRAVO FIGUEREDO
14 17391 LUSINE DAMIRCHYAN
15 17944 ISALKOU DJIL MERHBA
16 17772 CARLA DENISSE FIGUEROA PIEDRA
17 7230 MARIA ISABEL VISO CABAÑERO
18 11729 PILAR GARCIA SALAZAR
19 17275 MARIA LOURDES MALLEN LLUIS