如何解决仅从浏览器中打开的页面获取数据
` 在网路上 第一请求正文: eprocTenders:tenderNumber: eprocTenders:tender类别:-1 eprocTenders:tenderTitle: eprocTenders:tender说明: eprocTenders:ecvRange:-1 eprocTenders:departmentId: eprocTenders:状态:EVALUATION_COMPLETED eprocTenders:departmentLoc: eprocTenders:tenderCreateDate自:01/04/2019 eprocTenders:tenderCreateDateTo:31/03/2020 eprocTenders:tenderSubmissionDateFrom: eprocTenders:tenderSubmissionDateTo: eprocTenders:selectTender:SEARCHTENDERS eprocTenders:butSearch:搜索 eprocTenders_SUBMIT:1 jsf_sequence:2 eprocTenders:dataScrollerId: eprocTenders: link_hidden :
第二个请求正文:
eprocTenders:tenderNumber: eprocTenders:tender类别:-1 eprocTenders:tenderTitle: eprocTenders:tender说明: eprocTenders:ecvRange:-1 eprocTenders:departmentId: eprocTenders:状态:EVALUATION_COMPLETED eprocTenders:departmentLoc: eprocTenders:tenderCreateDate自:01/04/2019 eprocTenders:tenderCreateDateTo:31/03/2020 eprocTenders:tenderSubmissionDateFrom: eprocTenders:tenderSubmissionDateTo: eprocTenders:selectTender:SEARCHTENDERS eprocTenders_SUBMIT:1 jsf_sequence:3 eprocTenders:dataScrollerId:idx2 eprocTenders: link_hidden :eprocTenders:dataScrollerIdidx2 `
我正在尝试从该网站上抓取数据: URL
我正在尝试的代码:
import requests
import time
from bs4 import BeautifulSoup
import pandas as pd
mydata = 'https://eproc.karnataka.gov.in/eprocurement/common/eproc_tenders_list.seam'
with requests.Session() as session:
session.headers = {'Cookie':'JSESSIONID=DEBFA1809C30CE2F3F04D0044DFCA784.appp1vm22','Content-Type':'multipart/form-data; boundary=----WebKitFormBoundaryYxNGT6chlbwn3Ots','Content-Disposition': 'form-data',"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
mydata_Text = []
response = session.post(mydata,data=data,verify =False)
soup = BeautifulSoup(response.content,'html.parser')
for x in range(1,5):
data = {
'eprocTenders:status': 'EVALUATION_COMPLETED','eprocTenders:tenderCreateDateFrom': '01/04/2019','eprocTenders:tenderCreateDateTo': '31/03/2020','eprocTenders:butSearch' : 'Search','eprocTenders_SUBMIT': 1,'eprocTenders:dataScrollerId':'idx'+str(x),# 'eprocTenders:_link_hidden_: eprocTenders':'dataScrollerIdidx'+str(x),'jsf_sequence': str(x),'eprocTenders:selectTender': 'SEARCHTENDERS',}
print(data)
time.sleep(5)
mycontent = soup.find('table',attrs={'id':'eprocTenders:browserTableEprocTenders'})
table_body = mycontent.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [me.text.strip() for me in cols]
mydata_Text.append([me for me in cols if me])
print(len(mydata_Text))
我在这里想念什么?
解决方法
您只会获得首页,因为之后您再也不会提出其他请求。您继续从相同的初始response.content创建汤对象。您将需要请求并在循环内进行解析。尝试类似的东西:
import requests
import time
from bs4 import BeautifulSoup
import pandas as pd
mydata = 'https://eproc.karnataka.gov.in/eprocurement/common/eproc_tenders_list.seam'
with requests.Session() as session:
session.headers = {'Cookie':'JSESSIONID=DEBFA1809C30CE2F3F04D0044DFCA784.appp1vm22','Content-Type':'multipart/form-data; boundary=----WebKitFormBoundaryYxNGT6chlbwn3Ots','Content-Disposition': 'form-data',"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
mydata_Text = []
#response = session.post(mydata,data=data,verify =False) #<--- Put inside the loop
#soup = BeautifulSoup(response.content,'html.parser') #<--- Put inside the loop
for x in range(1,5):
data = {
'eprocTenders:status': 'EVALUATION_COMPLETED','eprocTenders:tenderCreateDateFrom': '01/04/2019','eprocTenders:tenderCreateDateTo': '31/03/2020','eprocTenders:butSearch' : 'Search','eprocTenders_SUBMIT': 1,'eprocTenders:dataScrollerId':'idx'+str(x),# 'eprocTenders:_link_hidden_: eprocTenders':'dataScrollerIdidx'+str(x),'jsf_sequence': str(x),'eprocTenders:selectTender': 'SEARCHTENDERS',}
print(data)
response = session.post(mydata,verify =False) #< --- HERE
soup = BeautifulSoup(response.content,'html.parser') #<--- HERE
time.sleep(5)
mycontent = soup.find('table',attrs={'id':'eprocTenders:browserTableEprocTenders'})
table_body = mycontent.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')
cols = [me.text.strip() for me in cols]
mydata_Text.append([me for me in cols if me])
print(len(mydata_Text))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。