如何解决R中的Web抓取-获取“记录中的错误[[x]] ....提供的元素多于要替换的元素”
请注意,我对R和R本身的Web抓取还很陌生,因此在解释响应时,请注意这一点...
我正试图通过网络抓取入住日期,评论标题和评论
这是我要生成的URL列表的地方:
library(rvest)
#GENERATING THE URLS
webpage_list <- vector(mode = "list")
#creating empty list
webpage_list
for(n in seq(from=5,to=15,by=5)){
webpage_list[[n]] <- glue::glue("https://www.sampleURL.com#REVIEWS")
}
#droping the empty values
webpage_list[sapply(webpage_list,is.null)] <- NULL
webpage_list
然后将列表转换为字符向量,并通过反复循环来开始确定我要抓取的网页区域
webpage_list2 <- unlist(webpage_list)
class(webpage_list2)
for(i in seq_along(webpage_list2)){
webpage <- read_html(webpage_list2[i])
results <- webpage %>% html_nodes(".oETBfkHU,._3hDPbqWO")
print(results)
# Building the dataset
records <- vector("character",length = (length(results)))
print(records)
}
直到现在我似乎都可以按照我的意愿工作
for (x in seq_along(results)) {
url <- read_html(webpage_list2[x])
dateOfStay <- str_c(url %>%
html_nodes("._34Xs-BQm") %>%
html_text())
reviewTitle <- str_sub(url %>%
html_nodes(".glasR4aX")%>%
html_text())
review <- str_sub(url %>%
html_nodes(".IRsGHoPm") %>%
html_text())
records[[x]] <- data_frame(dateOfStay = dateOfStay,reviewTitle = reviewTitle,review = review)#,review = review
}
#Build DF
DF <- bind_rows(records)
由此,我得到以下错误:
Error in records[[x]] <- data_frame(dateOfStay = dateOfStay,: more elements supplied than there are to replace
任何帮助将不胜感激,也请注意,我对R和R本身的Web抓取还很陌生,因此在解释响应时,请注意这一点。
解决方法
无需抓取,我们就能找到您的问题。您正在尝试将数据帧放在字符向量内。数据框不是字符。所以这是错误的尺寸。您可以通过将记录制成列表来解决它,也可以将数据框包装在列表中以将其强制为单个项目。我建议将记录设为列表。
records <- vector("character",length = (3))
records[[2]] <- data.frame(test = "A",test2 = "B")
# Error in records[[2]] <- data.frame(test = "A",test2 = "B") :
# more elements supplied than there are to replace
# Option 1:
records <- list(length = (3))
records[[2]] <- data.frame(test = "A",test2 = "B")
records
# $`length`
# [1] 3
#
# [[2]]
# test test2
# 1 A B
# Option 2:
records <- vector("character",length = (3))
records[[2]] <- list(data.frame(test = "A",test2 = "B"))
# records
# [[1]]
# [1] ""
#
# [[2]]
# [[2]][[1]]
# test test2
# 1 A B
#
#
# [[3]]
# [1] ""
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。