在列中插入新值时，如何使用group_by和case_when？

如何解决在列中插入新值时，如何使用group_by和case_when？

我正在努力解决这个数据争吵问题。我的联合研究输出df类似于以下内容：

   id set_number card_number att1 att2 att3 att4 score
1 932          1           1    1    1    1    3     0
2 932          1           2    2    2    4    4   100
3 932          1           3    8    8    8    8     0
4 932          2           1    3    3    3    1     0
5 932          2           2    4    2    2    4     0
6 932          2           3    8    8    8    8   100
7 933          1           1    1    1    1    3     0
8 933          1           2    2    2    4    4   100
9 933          1           3    8    8    8    8     0
...

其中id是一个人，而score是一个因变量。我需要重新格式化df，以便使用ChoiceModelR软件包进行分析。

我正在尝试找出如何编写代码（我想使用group_by（id和card_number）和case_when / if else语句时），如果得分为，则会插补与每个set_number对应的第一行中的card_number。该卡号为100。但是，如果所有att1到att4均为8s，则分数必须为“ card_number + 1”。

所需的df必须看起来像这样：

   id set_number card_number att1 att2 att3 att4 score
1 932          1           1    1    1    1    3     2
2 932          1           2    2    2    4    4     0
3 932          1           3    8    8    8    8     0
4 932          2           1    3    3    3    1     4
5 932          2           2    4    2    2    4     0
6 932          2           3    8    8    8    8     0
7 933          2           1    3    3    3    1     2
8 933          2           2    4    2    2    4     0
9 933          2           3    8    8    8    8     0

...

我将非常感谢您的帮助。

我在csv中完整的数据集。格式为here

Dput输出

structure(list(id = c(932L,932L,932L),set_number = c(1L,1L,2L,3L,4L),card_number = c(1L,1L),att1 = c(1L,8L,4L,5L,6L,3L),att2 = c(1L,att3 = c(1L,2L),att4 = c(3L,score = c(0L,100L,0L,0L)),class = "data.frame",row.names = c(NA,-10L))

解决方法

基于tidyverse的解决方案如下所示。

library(dplyr)
library(purrr)

as_tibble(df) %>%
  group_by(id,set_number) %>%
  mutate(scoreX = card_number[which(score == 100)][1],scoreX = pmap_dbl(list(att1,att2,att3,att4,score,scoreX),~ if_else(sum(..1,..2,..3,..4) == 32 & ..5 == 100,..6 + 1,as.double(..6))),scoreX = max(scoreX),scoreX = if_else(row_number() == min(row.names(.)),scoreX,0))


#       id set_number card_number  att1  att2  att3  att4 score scoreX
#    <int>      <int>       <int> <int> <int> <int> <int> <int>  <dbl>
#  1   932          1           1     1     1     1     3     0      2
#  2   932          1           2     2     2     4     4   100      0
#  3   932          1           3     8     8     8     8     0      0
#  4   932          2           1     3     3     3     1     0      2
#  5   932          2           2     4     2     2     4   100      0
#  6   932          2           3     8     8     8     8     0      0
#  7   932          3           1     5     4     1     3     0      2
#  8   932          3           2     6     3     3     2   100      0
#  9   932          3           3     8     8     8     8     0      0
# 10   932          4           1     3     1     2     2     0     NA

这可能不是解决此问题的最有效方法，但是可以解决了（我也欢迎采用其他任何方法来实现同一目标）：

df$dv = 0

for (i in seq(1,nrow(df),by = 3)){
  if(df$score[i] == 100)
  {df$dv[i] = 1}
  if(df$score[i+1] == 100)
  {df$dv[i] = 2}
  if(df$score[i+2] == 100)
  {df$dv[i] = 4}
}

dv是一个新列，用于存储更新的分数。然后，我刚刚使用子集功能删除了score列。

在列中插入新值时，如何使用group_by和case_when？

如何解决在列中插入新值时，如何使用group_by和case_when？

解决方法

相关推荐