如何解决使用R删除时间序列中连续零值之前的行
我有一个时间序列作为数据帧,如下所示:
date country value id
1/1/2020 A .2 Cv
1/2/2020 A 0 Cv
1/3/2020 A 0 Cv
..... ... .... ...
2/10/2020 A 2 ...
2/11/2020 A 0 Cv
3/11/2020. A 0 Cv
4/11/2020 A 0 Cv
5/11/2020 A 3 Cv
6/11/2020 A 4 Cv
7/11/2020 A 6 Cv
8/11/2020 A 7 Cv
我想删除数据帧中最后一个零序列之前的所有值: 我尝试了以下代码:
test <- ddply(df,.(id),function(x){
temp_country_data <- ddply(x,.(country),function(y){
temp_data <- data.frame(y) %>% arrange(date) %>% group_by("id","country")
dat<-temp_data
ToRemove <- apply(dat,2,function(colmn) {
row.zeros <- which(colmn == 0) # rows with zeros
if(length(row.zeros) > 0) { # if we found any
# which of them is the last double
last.doubles <- max(which(diff(row.zeros) == 1))
leftof.last.doubles <- "if"(length(last.doubles) > 0,# if double exists1:(row.zeros[last.doubles]-1),# all rows before
NULL) # else nothing
# remove rows with single zeros and all rows before double consecutive
unique(c(row.zeros,leftof.last.doubles)) }
temp_data<-dat[-unlist(ToRemove),]
temp_data = temp_data[,c("date","id","country","value")]
temp_data
},.parallel = T)
temp_country_data
},.progress = 'text')
但是,它仅删除了我不想要的零值。 我希望输出如下所示:另外,我想在最后的零序列之后间隔2天:
7/11/2020 A 6 Cv
8/11/2020 A 7 Cv
......
我也尝试过,但是仍然没有得到结果:
test3 <- ddply(df,"country")
temp_data<- temp_data%>% mutate(flag_0 = ifelse(value == 0,1,0),flag_0_cum = cumsum(flag_0)) %>%
filter(flag_0_cum == max(flag_0_cum)) %>%
filter(round(value,3) != 0) %>%
select(-flag_0,-flag_0_cum) %>%
slice(3:n())
temp_data = temp_data[,"short_id","raw_de")]
library(lubridate)
temp_data <- temp_data %>%
group_by(country,id) %>%
mutate(DATE = ymd(date),day_flag = if_else(DATE == (lag(DATE) + days(1)),0))
temp_data<- temp_data %>% filter(!is.na(day_flag))
temp_data<- temp_data %>%
mutate(flag_0 = ifelse(day_flag == 0,flag_0_cum = cumsum(flag_0)) %>%
filter(flag_0_cum == max(flag_0_cum)) %>%
filter(day_flag != 0) %>%
select(-flag_0,-flag_0_cum) %>%
slice(3:n())
temp_data
},.progress = 'text')
我在数据框中添加了另一列,以将连续的行标记为1,而不将连续的行标记为0。
请让我知道问题出在哪里。
解决方法
一种方法是用累积总和来标记零的出现(每次出现新的零时,标记都会加1),并仅保留包含最后零和所有后续非零值的最后一组。然后我们可以删除零本身和前两行:
library(dplyr)
df <- data.frame(date = seq.Date(from = as.Date("2020-11-01"),to = as.Date("2020-11-08"),by = "day"),country = "A",value = c(2,3,4,6,7),id = "Cv")
df %>%
mutate(flag_0 = ifelse(round(value,4) == 0,1,0),flag_0_cum = cumsum(flag_0)) %>%
filter(flag_0_cum == max(flag_0_cum)) %>%
filter(round(value,4) != 0) %>%
select(-flag_0,-flag_0_cum) %>%
slice(3:n())
,
您可以使用print (df1)
Month_Year Hotel_id
0 2016-04-30 2400133
1 2016-05-31 2400133
2 2016-06-30 2400133
3 2016-07-31 2400133
4 2016-08-31 2400133
5 2016-09-30 2400133
6 2016-10-31 2400133
7 2016-11-30 2400133
8 2016-12-31 2400133
9 2017-01-31 2400133
10 2017-02-28 2400133
11 2017-03-31 2400133
12 2017-04-30 2400133
13 2017-05-31 2400133
14 2015-06-30 2400178
15 2015-07-31 2400178
16 2015-08-31 2400178
17 2015-09-30 2400178
18 2015-10-31 2400178
19 2015-11-30 2400178
20 2015-12-31 2400178
21 2016-01-31 2400178
22 2016-02-29 2400178
23 2016-03-31 2400178
24 2016-04-30 2400178
25 2016-05-31 2400178
26 2016-06-30 2400178
27 2016-07-31 2400178
28 2016-08-31 2400178
29 2016-09-30 2400178
30 2016-10-31 2400178
31 2016-11-30 2400178
32 2016-12-31 2400178
33 2017-01-31 2400178
34 2017-02-28 2400178
35 2017-03-31 2400178
36 2017-04-30 2400178
37 2017-05-31 2400178
38 2017-06-30 2400178
39 2017-07-31 2400178
40 2017-08-31 2400178
41 2017-09-30 2400178
42 2017-10-31 2400178
43 2017-11-30 2400178
44 2017-12-31 2400178
45 2018-01-31 2400178
46 2018-02-28 2400178
47 2018-03-31 2400178
48 2018-04-30 2400178
49 2015-05-31 2400614
50 2015-06-30 2400614
51 2015-07-31 2400614
52 2015-08-31 2400614
53 2015-09-30 2400614
54 2015-10-31 2400614
55 2015-11-30 2400614
56 2015-12-31 2400614
57 2016-01-31 2400614
58 2016-02-29 2400614
59 2016-03-31 2400614
标识零,并使用diff
创建新索引cumsum
。然后排除零行,并在ix
中将值为head
的{{1}}应用于2
,以得到零后的前两行。最后by
。
rbind
数据:
DF1 <- as.data.frame(cbind(DF,ix=cumsum(c(-1,diff(DF$value == 0)) %in% 1)))
DF1 <- DF1[DF1$value != 0,]
res <- do.call(rbind,by(DF1,DF1$ix,head,2))
head(res)
# date country value id ix
# 0 2020-01-01 A 3.0 Cv 0
# 1.3 2020-01-03 A 3.0 Cv 1
# 1.4 2020-01-04 A 2.0 Cv 1
# 2.6 2020-01-06 A 3.0 Cv 2
# 2.7 2020-01-07 A 0.2 Cv 2
# 3 2020-01-10 A 4.0 Cv 3
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。