如何解决Python:简单循环中的多重处理
我的目标是通过Selenium驱动程序包获取各种网页的源代码。要在打开页面时使用空闲时间,我想利用多重处理。但是,由于我是多处理新手,所以我无法使代码正常工作。
这是一个简单功能,用作我希望并行运行的示例(需要 selenium webdriver软件包和时间软件包):
def get_source(links):
for i in range(len(links)):
time.wait(3)
driver.get(links[i])
time.wait(3)
print(driver.page_source)
time.wait(3)
print("Done with the page")
将不同的网页添加到此功能中,例如:
links = ["https://stackoverflow.com/questions/tagged/javascript","https://stackoverflow.com/questions/tagged/python","https://stackoverflow.com/questions/tagged/c%23","https://stackoverflow.com/questions/tagged/php"]
这是我到目前为止所拥有的。但是,不幸的是,它仅执行webdriver的垃圾邮件实例,而没有执行预期的操作。
if __name__ == '__main__':
pool = Pool(2)
pool.map(get_source(),links)
我们非常感谢您的帮助!非常感谢!
解决方法
使用multiprocessing.pool
时,请使用apply_async
方法将函数映射到参数列表。请注意,由于该函数是异步运行的,因此您应该将某种索引传递给该函数,并将其返回结果。在这种情况下,该函数将返回URL以及页面源。
尝试以下代码:
import multiprocessing as mp
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
def get_source(link): # single URL
time.sleep(3)
driver.get(link)
time.sleep(3)
print("Done with the page:",link)
return (link,driver.page_source) # return tuple: link & source
links = [
"https://stackoverflow.com/questions/tagged/javascript","https://stackoverflow.com/questions/tagged/python","https://stackoverflow.com/questions/tagged/c%23","https://stackoverflow.com/questions/tagged/php"
]
if __name__ == '__main__':
pool = mp.Pool(processes=2)
results = [pool.apply_async(get_source,args=(lnk,)) for lnk in links] # maps function to iterator
output = [p.get() for p in results] # collects and returns the results
for r in output:
print("len =",len(r[1]),"for link",r[0]) # read tuple elements
输出
Done with the page: https://stackoverflow.com/questions/tagged/python
Done with the page: https://stackoverflow.com/questions/tagged/javascript
Done with the page: https://stackoverflow.com/questions/tagged/c%23
Done with the page: https://stackoverflow.com/questions/tagged/php
len = 163045 for link https://stackoverflow.com/questions/tagged/javascript
len = 161512 for link https://stackoverflow.com/questions/tagged/python
len = 192744 for link https://stackoverflow.com/questions/tagged/c%23
len = 192678 for link https://stackoverflow.com/questions/tagged/php