为什么我在R中没有基于id和另一列获得正确的计数？

如何解决为什么我在R中没有基于id和另一列获得正确的计数？

我正在尝试使用R中的tidyverse库基于两列id和共病（具有不同类型的共病）来获得正确的共病计数。我试图理解为什么我做错了吗？由于我应用了显而易见的方法，请参见下面的内容：

这是数据的结构：

structure(list(id = c("133","cd5","392","ffa","6ed","9a2","989","870","2d9","f9e","d36","8f4","fb8","626","8fb","aea","af4","162","162"),Comorbidity_count = c("Comorbidity_one","Comorbidity_one","Comorbidity_two","Comorbidity_two"),Comorbidity = c("None","None","High Blood Pressure (hypertension)","Asthma (managed with an inhaler)","Diabetes Type 2","Obesity","Obesity")),row.names = c(NA,-20L),groups = structure(list(id = c("133",.rows = structure(list(
    7L,6L,16:17,19:20,11L,3L,4L,5L,8L,2L,14L,9L,15L,10L,12L,13L,18L,1L),ptype = integer(0),class = c("vctrs_list_of","vctrs_vctr","list"))),18L),class = c("tbl_df","tbl","data.frame"),.drop = TRUE),class = c("grouped_df","tbl_df","data.frame"))

如果我在下面编写代码，则计数不正确：

    count_id <- test %>%
      naniar::replace_with_na(replace = list(Comorbidity = "None")) %>%
      dplyr::group_by(id,Comorbidity) %>%
      dplyr::mutate(number_morbidities = n())

结果如下表所示：

structure(list(id = c("133",Comorbidity = c(NA,NA,"Obesity"),number_morbidities = c(NA,1L,2L)),groups = structure(list(
    id = c("133","ffa"),.rows = structure(list(1L,7L,4L),"data.frame"))

解决方法

您只需要按id进行分组，因为您希望每个ID计数，如果要忽略没有合并症的ID，可以使用另一种方法来计算合并症。 n()将计算所有行，无论是否丢失。请注意，如果没有合并症，这种方法将产生0，我认为这比NA更有意义；您可以根据需要将0替换为NA。请注意，我还跳过了naniar依赖项，但这并没有改变任何内容。

library(tidyverse)
test <- structure(list(id = c("133","cd5","392","ffa","6ed","9a2","989","870","2d9","f9e","d36","8f4","fb8","626","8fb","aea","af4","162","162"),Comorbidity_count = c("Comorbidity_one","Comorbidity_one","Comorbidity_two","Comorbidity_two"),Comorbidity = c("None","None","High Blood Pressure (hypertension)","Asthma (managed with an inhaler)","Diabetes Type 2","Obesity","Obesity")),row.names = c(NA,-20L),groups = structure(list(id = c("133",.rows = structure(list(7L,6L,16:17,19:20,11L,3L,4L,5L,8L,2L,14L,9L,15L,10L,12L,13L,18L,1L),ptype = integer(0),class = c("vctrs_list_of","vctrs_vctr","list"))),18L),class = c("tbl_df","tbl","data.frame"),.drop = TRUE),class = c("grouped_df","tbl_df","data.frame"))

test %>%
  mutate(Comorbidity = if_else(Comorbidity == "None",NA_character_,Comorbidity)) %>%
  group_by(id) %>%
  mutate(number_morbidities = sum(!is.na(Comorbidity)))
#> # A tibble: 20 x 4
#> # Groups:   id [18]
#>    id    Comorbidity_count Comorbidity                        number_morbidities
#>    <chr> <chr>             <chr>                                           <int>
#>  1 133   Comorbidity_one   <NA>                                                0
#>  2 cd5   Comorbidity_one   <NA>                                                0
#>  3 392   Comorbidity_one   <NA>                                                0
#>  4 ffa   Comorbidity_one   High Blood Pressure (hypertension)                  1
#>  5 6ed   Comorbidity_one   <NA>                                                0
#>  6 9a2   Comorbidity_one   <NA>                                                0
#>  7 989   Comorbidity_one   <NA>                                                0
#>  8 870   Comorbidity_one   Asthma (managed with an inhaler)                    1
#>  9 2d9   Comorbidity_one   <NA>                                                0
#> 10 f9e   Comorbidity_one   <NA>                                                0
#> 11 d36   Comorbidity_one   <NA>                                                0
#> 12 8f4   Comorbidity_one   <NA>                                                0
#> 13 fb8   Comorbidity_one   <NA>                                                0
#> 14 626   Comorbidity_one   <NA>                                                0
#> 15 8fb   Comorbidity_one   <NA>                                                0
#> 16 aea   Comorbidity_one   Diabetes Type 2                                     2
#> 17 aea   Comorbidity_two   Obesity                                             2
#> 18 af4   Comorbidity_one   <NA>                                                0
#> 19 162   Comorbidity_one   High Blood Pressure (hypertension)                  2
#> 20 162   Comorbidity_two   Obesity                                             2

^{由reprex package（v0.3.0）于2020-08-26创建}

为什么我在R中没有基于id和另一列获得正确的计数？

如何解决为什么我在R中没有基于id和另一列获得正确的计数？

解决方法

相关推荐