如何解决CasperJs 不抓取一些网站
我也尝试更改用户代理,但仍然无法正常工作,还有一些其他网站的 ajax 数据没有被抓取。当我尝试使用 file_get_contents() 打开抓取这些网站时,它返回“无法打开流”。但是使用 CURL 可以正常工作。
GET customers/_search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "properties.name","query": "Joe*"
}
}
],"should": [
{
"match": {
"properties.role": "admin"
}
},{
"match": {
"properties.role": "sysop"
}
},{
"match": {
"properties.role": "client"
}
},{
"match": {
"properties.status": "public"
}
},{
"match": {
"properties.status": "public"
}
}
],"must_not": [
{
"match": {
"properties.status": "hide_from_search_results"
}
},{
"match": {
"properties.status": "deleted"
}
},{
"match": {
"properties.status": "banned"
}
},{
"match": {
"properties.status": "hide_from_search_results"
}
},{
"match": {
"properties.status": "banned"
}
}
]
}
},"size": 30,"sort": [
{
"_score": {
"order": "desc"
}
},{
"_script": {
"type": "string","order": "desc","script": {
"lang": "painless","source": "return doc['_index'][0] == 'customers' && doc.containsKey('properties.videoCount')?doc['properties.videoCount'].value:0"
}
}
},"source": "long timestampNow = new Date().getTime(); return doc['_index'][0] == 'customers' && doc.containsKey('properties.subscriptions.features.allow-application')?(timestampNow < doc['properties.subscriptions.features.first-on-search'].value.getMillis()):false"
}
}
},"source": "return doc['_index'][0] == 'customers' && doc.containsKey('properties.videoCount')?doc['properties.videoCount'].value:0"
}
}
}
]
}
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。