如何解决根据组添加计数器
我正在尝试将计数器列(n_team
和n_bird
)添加到数据帧,而对dplyr::row_number
等没有成功。下面是一个带有输入数据帧(df
)和所需的输出数据帧(df_counts
)的reprex,以及一些错误输出的代码。
谢谢您的帮助!
library(dplyr)
# Input
df <-
tribble(
~id,~team,~bird,1,"blue","parrot",2,"green","owl",3,"toucan","finch",4,"penguin","sparrow"
)
# Desired output
# n_team is the team number within an id
# n_bird is the bird number within a team within an id
df_counts <-
tribble(
~id,~n_team,~n_bird,"sparrow",1
)
# Incorrect
df %>%
add_count(id,team,name = "n_team")
#> # A tibble: 6 x 4
#> id team bird n_team
#> <dbl> <chr> <chr> <int>
#> 1 1 blue parrot 1
#> 2 2 green owl 1
#> 3 3 blue toucan 2
#> 4 3 blue finch 2
#> 5 4 green penguin 1
#> 6 4 blue sparrow 1
df %>%
group_by(id) %>%
mutate(n_team = row_number(team))
#> # A tibble: 6 x 4
#> # Groups: id [4]
#> id team bird n_team
#> <dbl> <chr> <chr> <int>
#> 1 1 blue parrot 1
#> 2 2 green owl 1
#> 3 3 blue toucan 1
#> 4 3 blue finch 2
#> 5 4 green penguin 2
#> 6 4 blue sparrow 1
df %>%
group_by(id,team) %>%
mutate(n_team = 1:n())
#> # A tibble: 6 x 4
#> # Groups: id,team [5]
#> id team bird n_team
#> <dbl> <chr> <chr> <int>
#> 1 1 blue parrot 1
#> 2 2 green owl 1
#> 3 3 blue toucan 1
#> 4 3 blue finch 2
#> 5 4 green penguin 1
#> 6 4 blue sparrow 1
df %>%
group_by(id) %>%
mutate(n_team = n_distinct(team))
#> # A tibble: 6 x 4
#> # Groups: id [4]
#> id team bird n_team
#> <dbl> <chr> <chr> <int>
#> 1 1 blue parrot 1
#> 2 2 green owl 1
#> 3 3 blue toucan 1
#> 4 3 blue finch 1
#> 5 4 green penguin 2
#> 6 4 blue sparrow 2
df %>%
add_count(team)
#> # A tibble: 6 x 4
#> id team bird n
#> <dbl> <chr> <chr> <int>
#> 1 1 blue parrot 4
#> 2 2 green owl 2
#> 3 3 blue toucan 4
#> 4 3 blue finch 4
#> 5 4 green penguin 2
#> 6 4 blue sparrow 4
# Counts alphabetically
df %>%
group_by(id,team) %>%
mutate(n_bird = row_number(bird))
#> # A tibble: 6 x 4
#> # Groups: id,team [5]
#> id team bird n_bird
#> <dbl> <chr> <chr> <int>
#> 1 1 blue parrot 1
#> 2 2 green owl 1
#> 3 3 blue toucan 2
#> 4 3 blue finch 1
#> 5 4 green penguin 1
#> 6 4 blue sparrow 1
# Counts in order
df %>%
group_by(id,team) %>%
mutate(n_bird = row_number())
#> # A tibble: 6 x 4
#> # Groups: id,team [5]
#> id team bird n_bird
#> <dbl> <chr> <chr> <int>
#> 1 1 blue parrot 1
#> 2 2 green owl 1
#> 3 3 blue toucan 1
#> 4 3 blue finch 2
#> 5 4 green penguin 1
#> 6 4 blue sparrow 1
由reprex package(v0.3.0)于2020-09-04创建
以下是我咨询过的一些资源:
- add counter column by arranging two variables (dplyr)
- https://community.rstudio.com/t/how-to-add-a-counter-to-each-group-in-dplyr/12986/2
- https://dplyr.tidyverse.org/reference/tally.html
- https://dplyr.tidyverse.org/reference/n_distinct.html
解决方法
这里有4种非常相似但又不同的方法:
-
match
+unique
:
library(dplyr)
df %>%
group_by(id) %>%
mutate(n_teams = match(team,unique(team))) %>%
group_by(team,.add = TRUE) %>%
mutate(n_bird = match(bird,unique(bird)))
# id team bird n_teams n_bird
# <dbl> <chr> <chr> <int> <int>
#1 1 blue parrot 1 1
#2 2 green owl 1 1
#3 3 blue toucan 1 1
#4 3 blue finch 1 2
#5 4 green penguin 1 1
#6 4 blue sparrow 2 1
-
factor
+as.integer
:
df %>%
group_by(id) %>%
mutate(n_teams = as.integer(factor(team))) %>%
group_by(team,.add = TRUE) %>%
mutate(n_bird = as.integer(factor(bird)))
-
data.table::rleid
df %>%
group_by(id) %>%
mutate(n_teams = data.table::rleid(team)) %>%
group_by(team,.add = TRUE) %>%
mutate(n_bird = data.table::rleid(bird))
-
dense_rank
:
df %>%
group_by(id) %>%
mutate(n_teams = dense_rank(team)) %>%
group_by(team,.add = TRUE) %>%
mutate(n_bird = dense_rank(bird))
,
这是一种可能的方法:
df %>%
group_by(id) %>%
mutate(n_teams = cumsum(!duplicated(team))) %>%
group_by(id,team) %>%
mutate(n_bird = cumsum(!duplicated(bird))) %>%
ungroup()
,
data.table
版本
library(data.table)
setDT(df)
df[,n_team := match(team,unique(team)),id]
df[,n_bird := 1:.N,.(id,team)]
df
#> id team bird n_team n_bird
#> 1: 1 blue parrot 1 1
#> 2: 2 green owl 1 1
#> 3: 3 blue toucan 1 1
#> 4: 3 blue finch 1 2
#> 5: 4 green penguin 1 1
#> 6: 4 blue sparrow 2 1
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。