Web Scraper功能无法获取新数据

如何解决Web Scraper功能无法获取新数据

我创建的网络抓取功能遇到一些问题。

首先，我创建了一些函数来整理代码：

def get_reviews(string):
    index = string.find('Reviews')
    review = string[index - 10: index + 10]
    reviews.append(review.strip())

def append_institute(institute):
    if institute is not None:
        institutes.append(institute.text.strip())
    else:
        institutes.append(-1)

def append_provider(provider):
    if provider is not None:
        providers.append(provider.text.strip())
    else:
        providers.append(-1)

def append_date(date):
    if date is not None:
        dates.append(date.text.strip())
    else:
        dates.append('Self Paced')

def append_rating(rating):
    if rating is not None:
        ratings.append(rating.text.strip())
    else:
        ratings.append(-1)

def append_name(name):
    names.append(name)

然后我创建了网络爬虫：

def get_data(pages):
    names = []
    institutes = []
    providers = []
    dates = []
    reviews = []
    ratings = []
    for page in pages:
        r = requests.get(page)
        soup = BeautifulSoup(r.content,'html.parser')
        rows = soup.select('tbody tr')

        for row in rows:
            #name
            name = row.select_one('span',{ 'class': 'text-1 line-tight'}).text.strip()
            append_name(name)

            #institute
            institute = row.find('a',{'class': 'color-charcoal small-down-text-2 text-3'})
            append_institute(institute)

            #provider
            provider = row.find('span',{'class': 'hidden medium-up-inline-block'})
            append_provider(provider)

            #date
            date = row.find('td',{'itemprop': 'startDate'})
            append_date(date)

            #reviews
            rev = row.find('span',{'class': 'large-down-hidden block line-tight text-4 color-gray'})
            string = str(rev)
            get_reviews(string)

            #rating
            rating = row.find('span',attrs = {'class': 'xlarge-up-hidden color-charcoal text-center'})
            append_rating(rating)
            
    df = pd.DataFrame({'name': names,'institute': institutes,'provider': providers,'date': dates,'review': reviews,'rating': ratings})
    return df

但是，当我调用get_data函数时，出现错误：名称'names'未定义。我试着在函数之前引用空数组，并且该方法起作用了。但是，它只允许我运行一次函数，因为它们中的值被存储了。任何帮助将不胜感激。

Web Scraper功能无法获取新数据

如何解决Web Scraper功能无法获取新数据

相关推荐