如何解决如何在递归中使用多个Promise?
我正在尝试解决以下问题:脚本进入网站,从网站获取前10个链接,然后继续访问这10个链接,然后继续浏览前10个页面中每个页面上的下10个链接。直到访问的页面数为1000。 看起来是这样的: 我试图通过在promise和递归中使用for循环来获得此代码,这是我的代码:
const rp = require('request-promise');
const url = 'http://somewebsite.com/';
const websites = []
const promises = []
const getOnSite = (url,count = 0) => {
console.log(count,websites.length)
promises.push(new Promise((resolve,reject) => {
rp(url)
.then(async function (html) {
let links = html.match(/https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)/g)
if (links !== null) {
links = links.splice(0,10)
}
websites.push({
url,links,emails: emails === null ? [] : emails
})
if (links !== null) {
for (let i = 0; i < links.length; i++) {
if (count < 3) {
resolve(getOnSite(links[i],count + 1))
} else {
resolve()
}
}
} else {
resolve()
}
}).catch(err => {
resolve()
})
}))
}
getOnSite(url)
解决方法
我认为您可能需要一个带有三个参数的递归函数:
- 从中提取链接的网址数组
- 一组累积的链接
- 何时停止爬网的限制
您只需使用根URL调用它,然后等待所有返回的Promise:
const allLinks = await Promise.all(crawl([rootUrl]));
在首次调用时,第二个和第三个参数可以采用默认值:
async function crawl (urls,accumulated = [],limit = 1000) {
...
}
该函数将获取每个url,提取其链接,然后递归直到达到极限。 我还没有测试过任何,但我正在考虑以下方面的内容:
// limit the number of links per page to 10
const perPageLimit = 10;
async function crawl (urls,limit = 1000) {
// if limit has been depleted or if we don't have any urls,// return the accumulated result
if (limit === 0 || urls.length === 0) {
return accumulated;
}
// process this set of links
const links = await Promise.all(
urls
.splice(0,perPageLimit) // limit to 10
.map(url => fetchHtml(url) // fetch the url
.then(extractUrls)); // and extract its links
);
// then recurse
return crawl(
links,// newly extracted array of links from this call
[...accumulated,links],// pushed onto the accumulated list
limit - links.length // reduce the limit and recurse
);
}
async fetchHtml (url) {
//
}
const extractUrls = (html) => html.match( ... )
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。