如何解决根据某些条件按组在长数据集中删除行
我有这个df:
library(lubridate)
Date <- c("2020-10-01","2020-10-02","2020-10-03","2020-10-04","2020-10-01","2020-10-04")
Date <- as_date(Date)
Country <- c("USA","USA","Mexico","Japan","Japan")
Value_A <- c(0,40,25,29,34,20,27,0)
df<- data.frame(Date,Country,Value_A)
view(df)
Date Country Value_A
<date> <chr> <dbl>
1 2020-10-01 USA 0
2 2020-10-02 USA 40
3 2020-10-03 USA 0
4 2020-10-04 USA 0
5 2020-10-01 Mexico 25
6 2020-10-02 Mexico 29
7 2020-10-03 Mexico 34
8 2020-10-04 Mexico 0
9 2020-10-01 Japan 20
10 2020-10-02 Japan 25
11 2020-10-03 Japan 27
12 2020-10-04 Japan 0
我试图删除包含零的行,但前提是这些零在“国家/地区”列每组的最后两行中。因此结果将是:
Date Country Value_A
<date> <chr> <dbl>
1 2020-10-01 USA 0
2 2020-10-02 USA 40
5 2020-10-01 Mexico 25
6 2020-10-02 Mexico 29
7 2020-10-03 Mexico 34
9 2020-10-01 Japan 20
10 2020-10-02 Japan 25
11 2020-10-03 Japan 27
如果有人可以提供帮助,我表示感谢:)
解决方法
我们可以使用 tidyverse
包进行一些操作以获得结果。我们以group_by
为国家/地区,并按Date
降序排列。之后,我们生成row_number
s。最后,我们根据您描述的条件进行过滤:
library(tidyverse)
df %>%
group_by(Country) %>%
arrange(desc(Date)) %>%
mutate(rn = row_number()) %>%
filter(!(Value_A == 0 & rn <= 2))
# Date Country Value_A rn
# 1 2020-10-03 Mexico 34 2
# 2 2020-10-03 Japan 27 2
# 3 2020-10-02 USA 40 3
# 4 2020-10-02 Mexico 29 3
# 5 2020-10-02 Japan 25 3
# 6 2020-10-01 USA 0 4
# 7 2020-10-01 Mexico 25 4
# 8 2020-10-01 Japan 20 4
另一个method将使用rank(desc(Date))
library(tidyverse)
df %>%
group_by(Country) %>%
mutate(rank_date = rank(desc(Date))) %>%
filter(!(rank_date <= 2 & Value_A == 0))
# Date Country Value_A rank_date
# 1 2020-10-01 USA 0 4
# 2 2020-10-02 USA 40 3
# 3 2020-10-01 Mexico 25 4
# 4 2020-10-02 Mexico 29 3
# 5 2020-10-03 Mexico 34 2
# 6 2020-10-01 Japan 20 4
# 7 2020-10-02 Japan 25 3
# 8 2020-10-03 Japan 27 2
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。