如何解决收集和总结tidyverse中的步骤后保持因子顺序
我有一百多个变量,我正在尝试为其计算频率和百分比。如何在输出中保持每个变量值的因子顺序?请注意,为数据集外部的每个变量指定顺序不切实际,因为我有100多个变量。
示例数据:
df <- data.frame(gender=factor(c("male","female","male",NA),levels=c("male","female")),disease=factor(c("yes","yes","no",levels=c("yes","no")))
df
gender disease
1 male yes
2 female yes
3 male no
4 <NA> <NA>
尝试:
df %>% gather(key,value,factor_key = T) %>%
group_by(key,value) %>%
summarise(n=n()) %>%
ungroup() %>%
group_by(key) %>%
mutate(percent=n/sum(n))
输出:
# A tibble: 6 x 4
# Groups: key [2]
key value n percent
<fct> <chr> <int> <dbl>
1 gender female 1 0.25
2 gender male 2 0.5
3 gender NA 1 0.25
4 disease no 1 0.25
5 disease yes 2 0.5
6 disease NA 1 0.25
所需的输出将按性别将性别排序为男性,女性,将疾病排序为是,否。
解决方法
更新:如果您使用ivot_longer(新聚集),它将保留因子水平!您还可以在pivot_longer中使用参数names_transform和values_transform来微调列类型。
library(tidyverse)
df <- data.frame(gender=factor(c("male","female","male",NA),levels=c("male","female")),disease=factor(c("yes","yes","no",levels=c("yes","no")))
df %>%
pivot_longer(everything()) %>%
group_by(name,value) %>%
summarise(n=n(),.groups = "drop_last") %>%
mutate(percent=n/sum(n))
#> # A tibble: 6 x 4
#> # Groups: name [2]
#> name value n percent
#> <chr> <fct> <int> <dbl>
#> 1 disease yes 2 0.5
#> 2 disease no 1 0.25
#> 3 disease <NA> 1 0.25
#> 4 gender male 2 0.5
#> 5 gender female 1 0.25
#> 6 gender <NA> 1 0.25
由reprex软件包(v0.3.0)于2020-10-16创建
由于收集会删除值变量的因数,并且汇总也可能会删除数据框属性,因此您必须重新添加它们。您可以通过阅读并组合以下因子级别,以半自动方式重新添加它们:
library(tidyverse)
df <- data.frame(gender=factor(c("male","no")))
df %>%
gather(key,value,factor_key = T) %>%
group_by(key,value) %>%
summarise(n=n()) %>%
ungroup() %>%
group_by(key) %>%
mutate(percent=n/sum(n),value = factor(value,levels = df %>% map(levels) %>% unlist())) %>%
arrange(key,value)
#> Warning: attributes are not identical across measure variables;
#> they will be dropped
#> `summarise()` regrouping output by 'key' (override with `.groups` argument)
#> # A tibble: 6 x 4
#> # Groups: key [2]
#> key value n percent
#> <fct> <fct> <int> <dbl>
#> 1 gender male 2 0.5
#> 2 gender female 1 0.25
#> 3 gender <NA> 1 0.25
#> 4 disease yes 2 0.5
#> 5 disease no 1 0.25
#> 6 disease <NA> 1 0.25
由reprex软件包(v0.3.0)于2020-10-16创建
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。