使用 grepl()

如何解决使用 grepl()

我写了以下内容，并且没有错误。

df2$qualifications <- as.numeric(grepl("high school|Bachelor|master|phd",df2$description,ignore.case=TRUE))
df2$qualifications

这是输出，如果提到上述任何一个词，则显示 1，否则显示 0。

[1] 0 0 0 1 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 0
 [51] 0 0 0 1 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 1 1 0 0 1 0 0 1 0 1 1 0 0 0 0 1
[101] 0 1 0 0

这是一个包含职位发布以及他们正在搜索的教育资格的数据集，我有兴趣为职位描述中提到的每个教育水平分配一个虚拟变量。

具体来说，我正在寻找如下所示的内容，其中 0 是没有提到资格的地方 1 高中 2 学士 3位大师 4 博士

1] 0 2 4 1 3 1 0 1 0 1 1 1 2 1 0 1

解决方法

使用 for 循环：

df2 = data.frame(description = sample(educ,100,TRUE))
df2$qualifications = NA #creating empty column

#placing the possible levels into a vector
educ = c("high school","Bachelor","master","phd")

#for each value in educ,if description has that value assign the new column one of the 4 numbers
for(i in educ){
  value = grepl(i,df2$description,ignore.case=TRUE)
  df2$qualifications[which(value)] = (1:4)[educ==i]}

由于您已经在创建分类变量，因此我建议您使用

您也可以使用 case_when 中的 dplyr 执行此操作：

library(dplyr)

df %>% 
  dplyr::mutate(qualifications = case_when(
    grepl("high school",description,ignore.case = T) ~ 1,grepl("Bachelor",ignore.case = T) ~ 2,grepl("master",ignore.case = T) ~ 3,grepl("phd",ignore.case = T) ~ 4,T ~ 0
  ))

使用 plyr 的 mapvalues 函数：

tibble::tibble(
  dummy_data = sample(c('no qual','high school','Bachelor','master','phd'),20,replace = T)
) %>% 
  mutate(
    dummy_variable = plyr::mapvalues(dummy_data,c('no qual',0:4),dummy_variable = as.integer(dummy_variable)
  )

输出：

# A tibble: 20 x 2
   dummy_data  dummy_variable
   <chr>                <int>
 1 no qual                  0
 2 phd                      4
 3 phd                      4
 4 high school              1
 5 no qual                  0
 6 phd                      4
 7 no qual                  0
 8 no qual                  0
 9 no qual                  0
10 no qual                  0
11 master                   3
12 phd                      4
13 high school              1
14 no qual                  0
15 Bachelor                 2
16 high school              1
17 high school              1
18 phd                      4
19 phd                      4
20 phd                      4

如何解决使用 grepl()

解决方法

相关推荐