如何解决为什么我在R中没有基于id和另一列获得正确的计数?
我正在尝试使用R中的tidyverse库基于两列id和共病(具有不同类型的共病)来获得正确的共病计数。我试图理解为什么我做错了吗?由于我应用了显而易见的方法,请参见下面的内容:
这是数据的结构:
structure(list(id = c("133","cd5","392","ffa","6ed","9a2","989","870","2d9","f9e","d36","8f4","fb8","626","8fb","aea","af4","162","162"),Comorbidity_count = c("Comorbidity_one","Comorbidity_one","Comorbidity_two","Comorbidity_two"),Comorbidity = c("None","None","High Blood Pressure (hypertension)","Asthma (managed with an inhaler)","Diabetes Type 2","Obesity","Obesity")),row.names = c(NA,-20L),groups = structure(list(id = c("133",.rows = structure(list(
7L,6L,16:17,19:20,11L,3L,4L,5L,8L,2L,14L,9L,15L,10L,12L,13L,18L,1L),ptype = integer(0),class = c("vctrs_list_of","vctrs_vctr","list"))),18L),class = c("tbl_df","tbl","data.frame"),.drop = TRUE),class = c("grouped_df","tbl_df","data.frame"))
如果我在下面编写代码,则计数不正确:
count_id <- test %>%
naniar::replace_with_na(replace = list(Comorbidity = "None")) %>%
dplyr::group_by(id,Comorbidity) %>%
dplyr::mutate(number_morbidities = n())
结果如下表所示:
structure(list(id = c("133",Comorbidity = c(NA,NA,"Obesity"),number_morbidities = c(NA,1L,2L)),groups = structure(list(
id = c("133","ffa"),.rows = structure(list(1L,7L,4L),"data.frame"))
解决方法
您只需要按id
进行分组,因为您希望每个ID计数,如果要忽略没有合并症的ID,可以使用另一种方法来计算合并症。 n()
将计算所有行,无论是否丢失。请注意,如果没有合并症,这种方法将产生0,我认为这比NA
更有意义;您可以根据需要将0替换为NA
。请注意,我还跳过了naniar
依赖项,但这并没有改变任何内容。
library(tidyverse)
test <- structure(list(id = c("133","cd5","392","ffa","6ed","9a2","989","870","2d9","f9e","d36","8f4","fb8","626","8fb","aea","af4","162","162"),Comorbidity_count = c("Comorbidity_one","Comorbidity_one","Comorbidity_two","Comorbidity_two"),Comorbidity = c("None","None","High Blood Pressure (hypertension)","Asthma (managed with an inhaler)","Diabetes Type 2","Obesity","Obesity")),row.names = c(NA,-20L),groups = structure(list(id = c("133",.rows = structure(list(7L,6L,16:17,19:20,11L,3L,4L,5L,8L,2L,14L,9L,15L,10L,12L,13L,18L,1L),ptype = integer(0),class = c("vctrs_list_of","vctrs_vctr","list"))),18L),class = c("tbl_df","tbl","data.frame"),.drop = TRUE),class = c("grouped_df","tbl_df","data.frame"))
test %>%
mutate(Comorbidity = if_else(Comorbidity == "None",NA_character_,Comorbidity)) %>%
group_by(id) %>%
mutate(number_morbidities = sum(!is.na(Comorbidity)))
#> # A tibble: 20 x 4
#> # Groups: id [18]
#> id Comorbidity_count Comorbidity number_morbidities
#> <chr> <chr> <chr> <int>
#> 1 133 Comorbidity_one <NA> 0
#> 2 cd5 Comorbidity_one <NA> 0
#> 3 392 Comorbidity_one <NA> 0
#> 4 ffa Comorbidity_one High Blood Pressure (hypertension) 1
#> 5 6ed Comorbidity_one <NA> 0
#> 6 9a2 Comorbidity_one <NA> 0
#> 7 989 Comorbidity_one <NA> 0
#> 8 870 Comorbidity_one Asthma (managed with an inhaler) 1
#> 9 2d9 Comorbidity_one <NA> 0
#> 10 f9e Comorbidity_one <NA> 0
#> 11 d36 Comorbidity_one <NA> 0
#> 12 8f4 Comorbidity_one <NA> 0
#> 13 fb8 Comorbidity_one <NA> 0
#> 14 626 Comorbidity_one <NA> 0
#> 15 8fb Comorbidity_one <NA> 0
#> 16 aea Comorbidity_one Diabetes Type 2 2
#> 17 aea Comorbidity_two Obesity 2
#> 18 af4 Comorbidity_one <NA> 0
#> 19 162 Comorbidity_one High Blood Pressure (hypertension) 2
#> 20 162 Comorbidity_two Obesity 2
由reprex package(v0.3.0)于2020-08-26创建
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。