我如何提取没有类的.finddiv或span

如何解决我如何提取没有类的.finddiv或span

我想从此{https://stackoverflow.com/jobs?q=python&sort%20=i}网站的html中提取网页ID。

我的代码是这个...

import requests
from bs4 import BeautifulSoup

URL = f"https://stackoverflow.com/jobs?q=python&sort%20=i"

def extract_job(html):
    title = html.find("h2",{"class": "mb4 fc-black-800 fs-body3"}).find('a')["title"]
    company_row = html.find("h3",{"class": "fc-black-700 fs-body1 mb4"}).find_all("span",recursuve = False)
    try:
        company = company_row[0].get_text(strip=True)
    except:
        company = "None"
    try:
        location = company_row[1].get_text(strip=True).strip("\n")
    except:
        location = "None"
    return {'title': title,'company': company,'location': location}

def extract_jobs(last_page):
    
    jobs = []
    for page in range(last_page):
        print(f"Scrpping so: Page: {page}")
        result = requests.get(f"{URL}&pg={page+1}")
        soup = BeautifulSoup(result.text,"html.parser")
        results = soup.find_all("div",{"class": "grid--cell fl1"})
        for result in results:
            job = extract_job(result)
            jobs.append(job)
    return jobs

print(extract_jobs(1))

但是我的代码只给我[]（空列表）。

为什么...？

以及如何从此html获取data-jobid？

我正在尝试从HTML下方提取

<div data-jobid="185876" data-result-id="185876" data-preview-url="/jobs/185876?a=10kTLndTtq3S&amp;so=i&amp;sec=False&amp;pg=1&amp;offset=0&amp;total=163&amp;srp=True&amp;so_medium=Internal&amp;so_source=JobSearchPreview" data-beacon-url="/jobs/n/v/185876?url=%2Fjobs%2F185876%3Fa%3D10kTLndTtq3S%26so%3Di%26sec%3DFalse%26pg%3D1%26offset%3D0%26total%3D163%26srp%3DTrue%26so_medium%3DInternal%26so_source%3DJobSearchPreview&amp;referrer=http%3A%2F%2Fcareers.stackoverflow.com%2Fso-proxy%2Fjobs%3Fq%3Dpython%26sort%20%2B%3Di" class="-job js-result js-dismiss-overlay-container ps-relative _selected js-selected p12 pl24 _featured">
        <div class="dismiss-overlay ps-absolute ta-center t0 r0 b0 l0 grid ai-center jc-center o90 bg-black-050 z-active">
            <p class="mb0">Okay,you won’t see this job anymore. <a href="#" class="js-undismiss-job" data-id="185876">Undo</a></p>
        </div>
    
    <div class="grid">
                <div class="grid--cell fl-shrink mr12 w48 h48">
                    <img src="https://i.stack.imgur.com/UI3Jl.png?s=48" class="w48 h48 bar-sm">
                </div>
        <div class="grid--cell fl1 ">
                <span class="float-right ml12 mrn12 bg-yellow-200 fc-yellow-900 px8 py4 tt-uppercase fw-bold fs-fine bar-sm">featured</span>

            <h2 class="mb4 fc-black-800 fs-body3">
<a href="/jobs/185876/senior-software-engineer-frontend-deepfield-networks?a=10kTLndJTsqc&amp;so=i&amp;pg=1&amp;offset=0&amp;total=163&amp;so_medium=Internal&amp;so_source=JobSearch&amp;q=python" title="Senior Software Engineer (Frontend)" class="s-link stretched-link">Senior Software Engineer (Frontend)</a>            </h2>

            <h3 class="fc-black-700 fs-body1 mb4">
                <span>Deepfield Networks
                </span>
                •
                <span class="fc-black-500">
Ann Arbor,MI                </span>
            </h3>

此外，我曾尝试添加此线

id_list = soup.find_all("div",{"data-jobid": True})

这给了我包括“ data-jobid”的全部信息

但是我只能吸引像这样的数字。（此数字来自HTML编码）

帮我...

解决方法

第一个运行的代码很好地检查了您的互联网带宽

我得到的输出种类：

Scrpping so: Page: 0
([{'title': 'Principal Software Engineer -Python - Contact Center','company': 'Delivery Hero SE','location': 'Berlin,Germany'},{'title': 'Senior Software Engineer - Python','company': 'YouGov','location': 'Warsaw,Poland'},{'title': 'Python Developer','location': 'Bengaluru,India'},'company': 'GeekyWorks','location': 'Pune,{'title': 'Python Developers','company': 'JB Solutions','location': ''},{'title': 'Full Stack (Python/React) Software engineer','company': 'JPMorgan Chase Bank,N.A.','location': 'New York,NY'},{'title': 'Mid-Level Backend Software Engineer (Python / Django)','company': 'Tivix,Inc.','location': 'Wrocław,{'title': 'Back-End Developer (Python/Django)','company': 'Apacio Ltd','location': 'Theale,UK'},{'title': 'Python Software Developer','company': 'Old Mission Capital,LLC','location': 'Chicago,IL'},{'title': 'Python Developer/Bangalore','company': 'Talent Zone Consultants',

我如何提取没有类的.finddiv或span

如何解决我如何提取没有类的.finddiv或span

解决方法

相关推荐