如何解决根据包含多个字符串的向量选择数据框的行
我有一个词向量,我需要用它来选择具有 1000 多个观察值的数据框中的多行。下面我放一个简单的。
这些是我必须在数据框中寻找的食物:ls_foods <- c("Abacate","Abacaxi","Abóbora","Abobrinha","Acelga","Acerola","Alface","Almeirão","Arroz","Banana","Batata","Batata doce","Berinjela","Brocolis","Cacau","Café")
这是df。我必须只选择包含向量 ls_foods
中的单词的行。其中一些包含特殊字符,而另一些则不包含。
id <- (1:5)
Variables <- c("abacate - kg","batata inglesa - Kg","Pera - pés","Brocolis - Kg","Laranja (Lima,Pêra,da Terra,etc) - Pés")
df <- data.frame(id,variables)
我试过了,但没有成功:
df <- df[grepl(ls_foods,df$desc_var)]
我期望的结果是:
id <- c(1,2,4)
Variables <- c("abacate - kg","Brocolis - Kg")
df_1 <- data.frame(id,variables)
提前致谢!
解决方法
试试像下面这样的 subset
+ grepl
subset(
df,grepl(
paste0(ls_foods,collapse = "|"),variables,ignore.case = TRUE
)
)
给出
id variables
1 1 abacate - kg
2 2 batata inglesa - Kg
4 4 Brocolis - Kg
,
这是我的答案,检查我是否使用一行将特殊字符转换为非特殊字符,您必须指定出现的特殊字符。
id <- (1:5)
variables <- c("abacate - kg","batata inglesa - Kg","Pera - pés","Brocolis - Kg","Laranja (Lima,Pêra,da Terra,etc) - Pés")
df <- data.frame(id,variables)
ls_foods <- c("Abacate","Abacaxi","Abóbora","Abobrinha","Acelga","Acerola","Alface","Almeirão","Arroz","Banana","Batata","Batata doce","Berinjela","Brocolis","Cacau","Café")
# Convert special characters with chartr
answer <- unlist(sapply(chartr(old = "áéíóúàèìòù",new = "aeiouaeiou",x = tolower(ls_foods)),grep,x = tolower(variables)))
,
通常在处理重音字符时,重音字符不被视为有意义,我会将所有内容简化为拉丁文 ASCII。 stringi
包对此很方便。
library(stringi)
simplify <- function(x) stri_trans_general(x,"Latin-ASCII; Lower"))
df[
stri_detect_regex(
simplify(df$variables),paste(simplify(ls_foods),collapse = "|")
),]
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。