如何解决计算字符串中完全匹配的单词的数量
我有一个小标题,其中包含一个id
列和一个捕获人们输入的text_entry
的列。
目标:将每个人的text_entry
与key
进行比较,并计算出完全个键入单词的数量。
例如,如果我的输入是:
df <- tribble(~id,~text_entry,1,"It was a Saturday night in December.",2," It was a Saturday night",3,"It wuz a Sturday nite in",4,"IT WAS A SATURDAY",5,"was a Saturday"); df
key <- "It was a Saturday night in December."
然后,我需要以下内容:
df2 <- tribble(~id,~words_correct,7,# whole string perfect
2,# first 5 words perfect
3,# misspelled "was","Saturday" and "night"
4,# case-sensitive
5,"was a Saturday",3); df2 # ok to start several words into the key
我完全采用stringr
/ stringi
解决方案。 tidyverse
始终是首选,但我迫切需要任何解决方案。
非常感谢,非常感谢您提前提供帮助和见解!
解决方法
一种方法是在空白处分割字符串,并用key
计算常用字数。
library(tidyverse)
keywords <- strsplit(key,'\\s+')[[1]]
df %>%
mutate(text = str_split(text_entry,'\\s+'),words_correct = map_dbl(text,~sum(.x %in% keywords)))
# A tibble: 5 x 3
# id text_entry words_correct
# <dbl> <chr> <dbl>
#1 1 "It was a Saturday night in December." 7
#2 2 " It was a Saturday night" 5
#3 3 "It wuz a Sturday nite in" 3
#4 4 "IT WAS A SATURDAY" 0
#5 5 "was a Saturday" 3
我们也可以在基数R中执行此操作:
df$words_correct <- sapply(strsplit(df$text_entry,function(x) sum(x %in% keywords))
,
您可以提取非空间部分并将其传递给str_detect()
。
library(tidyverse)
df %>%
mutate(words_correct = map_dbl(str_extract_all(text_entry,"[^\\s]+"),~ sum(str_detect(key,.))))
# # A tibble: 5 x 3
# id text_entry words_correct
# <dbl> <chr> <dbl>
# 1 1 "It was a Saturday night in December." 7
# 2 2 " It was a Saturday night" 5
# 3 3 "It wuz a Sturday nite in" 3
# 4 4 "IT WAS A SATURDAY" 0
# 5 5 "was a Saturday" 3
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。