如何解决检查R中的列表中是否存在数据框列值 数据
我有一个色彩大师作为下面的列表
master <- list("Beige" = c("light brown","light golden","skin"),"off-white" = c("off white","cream","light cream","dirty white"),"Metallic" = c("steel","silver"),"Multi-colored" = c("multi color","mixed colors","mix","rainbow"),"Purple" = c("lavender","grape","jam","raisin","plum","magenta"),"Red" = c("cranberry","strawberry","raspberry","dark cherry","cherry","rosered"),"Turquoise" = c("aqua marine","jade green"),"Yellow" = c("fresh lime")
)
这是我拥有的datframe列
df$color <- c('multi color','purple','steel','metallic','off white','raisin','strawberry','magenta','skin','Beige','Jade Green','cream','multi-colored','offwhite','rosered',"light cream")
现在我要检查column
中存在的值是否与list key
相同或与list values
相同
ex :
1)如果df列的值首先为off white
,则应查看列表键(如果存在)为Beige,off-white,Metallic...
而不是获取值
2)如果其中一个键值是light cream
,它还应该查看这些键具有的所有值,而不是应将其视为off-white
3)没有大小写敏感的问题,例如OffWhITe == offwhite
或空格问题,例如off white==offwhite
输出
这应该是预期的输出
df$output <- c("Multi-colored","Purple","Metallic","off-white","Red","Beige","Turquoise","Multi-colored","off-white")
编辑
此c("multi color","rainbow","multicolored","MultI-cOlored","multi-colored","MultiColORed","Multi-colored")
中的任何值都应视为Multi-colored
解决方法
也许在string_dist_join
将stack
变成单个数据之后,我们可以进行list
library(dplyr)
library(fuzzyjoin)
library(tibble)
enframe(master,value = 'color') %>%
unnest(c(color)) %>%
type.convert(as.is = TRUE) %>%
stringdist_right_join(df %>%
mutate(rn = row_number()),max_dist = 3) %>%
transmute(color = color.y,output = coalesce(name,color.y))
# A tibble: 19 x 2
# color output
# <chr> <chr>
# 1 multi color Multi-colored
# 2 purple purple
# 3 steel Metallic
# 4 metallic metallic
# 5 off white off-white
# 6 raisin Purple
# 7 strawberry Red
# 8 strawberry Red
# 9 magenta Purple
#10 skin Beige
#11 skin Multi-colored
#12 Beige Beige
#13 Jade Green Turquoise
#14 cream off-white
#15 cream Purple
#16 multi-colored Multi-colored
#17 offwhite off-white
#18 rosered Red
#19 light cream off-white
数据
df <- structure(list(color = c("multi color","purple","steel","metallic","off white","raisin","strawberry","magenta","skin","Beige","Jade Green","cream","multi-colored","offwhite","rosered","light cream")),class = "data.frame",row.names = c(NA,-16L
))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。