如何解决如何过滤数据框以连续增加值
我有一个包含几列的大型数据框,但对此查询我感兴趣的是3列。
df <- structure(list(country = structure(c(1L,1L,1L),.Label = "ireland",class = "factor"),parameter = structure(c(2L,3L,2L,3L),.Label = c("rainfall","temp","wind"),value = c(10L,15L,20L,9L,18L,10L,12L,25L,20L),unit = structure(c(3L,.Label = c("km/hr","mm","oC"),class = "factor")),class = "data.frame",row.names = c(NA,-12L))
country parameter value unit
ireland temp 10 oC
ireland rainfall 15 mm
ireland wind 20 km/hr
ireland temp 9 oC
ireland rainfall 18 mm
ireland wind 10 km/hr
ireland temp 12 oC
ireland rainfall 25 mm
ireland wind 15 km/hr
ireland temp 10 oC
ireland rainfall 10 mm
ireland wind 20 km/hr
我想按国家和参数分组以提取其中值列连续增加3倍或更多的行。
欲望输出的例子。
country parameter value unit
ireland rainfall 15 mm
ireland rainfall 18 mm
ireland rainfall 25 mm
ireland wind 10 km/hr
ireland wind 15 km/hr
ireland wind 20 km/hr
解决方法
您可以按国家和参数分组,它们会为不小于滞后值的值创建第三个分组变量,然后过滤组大小为3或更大的值:
library(dplyr)
df %>%
group_by(country,parameter) %>%
group_by(x = cumsum(value <= lag(value,default = FALSE)),.add = TRUE) %>%
filter(n() >= 3) %>%
ungroup() %>%
arrange(country,parameter) %>%
select(-x)
# A tibble: 6 x 4
country parameter value unit
<fct> <fct> <int> <fct>
1 ireland rainfall 15 mm
2 ireland rainfall 18 mm
3 ireland rainfall 25 mm
4 ireland wind 10 km/hr
5 ireland wind 15 km/hr
6 ireland wind 20 km/hr
,
这是使用subset
+ ave
+ rle
subset(
df[with(df,order(country,parameter)),],!!ave(value,country,parameter,FUN = function(x) with(rle(cumsum(c(1,diff(x) <= 0))),rep(lengths >= 3,lengths)))
)
给出
country parameter value unit
2 ireland rainfall 15 mm
5 ireland rainfall 18 mm
8 ireland rainfall 25 mm
6 ireland wind 10 km/hr
9 ireland wind 15 km/hr
12 ireland wind 20 km/hr
,
df %>%
group_by(country,parameter) %>%
mutate(
flag = c(0,diff(value)) > 0,flag_lag = lead(flag),seq_end = flag == TRUE & flag_lag %in% c(NA,FALSE),seq_begin = flag == FALSE & flag_lag == TRUE,) %>%
slice(if(length(which(seq_begin == TRUE):which(seq_end == TRUE)) >= 3) which(seq_begin == TRUE):which(seq_end == TRUE) else NA) %>%
select(-contains("flag"),-contains("seq"))
,
这是使用dplyr
df %>%
arrange(value,decreasing = FALSE) %>% # Arrange by value
filter(lag(value) > 3) # And filter for a difference above 3 in value
df
country parameter value unit
1 ireland temp 10 oC
2 ireland wind 10 km/hr
3 ireland temp 10 oC
4 ireland rainfall 10 mm
5 ireland temp 12 oC
6 ireland rainfall 15 mm
7 ireland wind 15 km/hr
8 ireland rainfall 18 mm
9 ireland wind 20 km/hr
10 ireland wind 20 km/hr
11 ireland rainfall 25 mm
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。