使用R删除时间序列中连续零值之前的行

如何解决使用R删除时间序列中连续零值之前的行

我有一个时间序列作为数据帧，如下所示：

date          country      value       id
1/1/2020       A            .2         Cv
1/2/2020       A             0         Cv
1/3/2020       A             0         Cv 
.....         ...           ....       ...
2/10/2020      A              2        ...
2/11/2020      A              0        Cv
3/11/2020.     A              0        Cv
4/11/2020      A              0        Cv
5/11/2020      A              3        Cv
6/11/2020      A              4        Cv
7/11/2020      A              6        Cv
8/11/2020      A              7        Cv

我想删除数据帧中最后一个零序列之前的所有值：我尝试了以下代码：

test <- ddply(df,.(id),function(x){
  temp_country_data <- ddply(x,.(country),function(y){
    temp_data <- data.frame(y) %>% arrange(date) %>% group_by("id","country") 
dat<-temp_data
    ToRemove <- apply(dat,2,function(colmn) {
      row.zeros <- which(colmn == 0) # rows with zeros
      if(length(row.zeros) > 0) { # if we found any
        # which of them is the last double
        last.doubles <- max(which(diff(row.zeros) == 1))
        leftof.last.doubles <- "if"(length(last.doubles) > 0,# if double exists1:(row.zeros[last.doubles]-1),# all rows before
                                    NULL) # else nothing
        # remove rows with single zeros and all rows before double consecutive 
        unique(c(row.zeros,leftof.last.doubles)) }
      temp_data<-dat[-unlist(ToRemove),]
    temp_data = temp_data[,c("date","id","country","value")]
    temp_data
  },.parallel = T)
  temp_country_data
},.progress = 'text')

但是，它仅删除了我不想要的零值。我希望输出如下所示：另外，我想在最后的零序列之后间隔2天：

7/11/2020      A              6        Cv
8/11/2020      A              7        Cv

......

我也尝试过，但是仍然没有得到结果：

test3 <- ddply(df,"country") 
    temp_data<- temp_data%>% mutate(flag_0 = ifelse(value == 0,1,0),flag_0_cum = cumsum(flag_0)) %>% 
      filter(flag_0_cum == max(flag_0_cum)) %>% 
      filter(round(value,3) != 0) %>% 
      select(-flag_0,-flag_0_cum) %>%
      slice(3:n())
    temp_data = temp_data[,"short_id","raw_de")]
    library(lubridate)
    
    temp_data <- temp_data %>% 
      group_by(country,id) %>%                          
      mutate(DATE = ymd(date),day_flag = if_else(DATE == (lag(DATE) + days(1)),0))

temp_data<- temp_data %>% filter(!is.na(day_flag))
temp_data<- temp_data %>% 
  mutate(flag_0 = ifelse(day_flag == 0,flag_0_cum = cumsum(flag_0)) %>% 
  filter(flag_0_cum == max(flag_0_cum)) %>% 
  filter(day_flag != 0) %>% 
  select(-flag_0,-flag_0_cum) %>%
  slice(3:n())

   temp_data
  },.progress = 'text')

我在数据框中添加了另一列，以将连续的行标记为1，而不将连续的行标记为0。

请让我知道问题出在哪里。

解决方法

一种方法是用累积总和来标记零的出现（每次出现新的零时，标记都会加1），并仅保留包含最后零和所有后续非零值的最后一组。然后我们可以删除零本身和前两行：

library(dplyr) 

df <- data.frame(date = seq.Date(from = as.Date("2020-11-01"),to = as.Date("2020-11-08"),by = "day"),country = "A",value = c(2,3,4,6,7),id = "Cv")

df %>% 
  mutate(flag_0 = ifelse(round(value,4) == 0,1,0),flag_0_cum = cumsum(flag_0)) %>% 
  filter(flag_0_cum == max(flag_0_cum)) %>% 
  filter(round(value,4) != 0) %>% 
  select(-flag_0,-flag_0_cum) %>%
  slice(3:n())

您可以使用print (df1) Month_Year Hotel_id 0 2016-04-30 2400133 1 2016-05-31 2400133 2 2016-06-30 2400133 3 2016-07-31 2400133 4 2016-08-31 2400133 5 2016-09-30 2400133 6 2016-10-31 2400133 7 2016-11-30 2400133 8 2016-12-31 2400133 9 2017-01-31 2400133 10 2017-02-28 2400133 11 2017-03-31 2400133 12 2017-04-30 2400133 13 2017-05-31 2400133 14 2015-06-30 2400178 15 2015-07-31 2400178 16 2015-08-31 2400178 17 2015-09-30 2400178 18 2015-10-31 2400178 19 2015-11-30 2400178 20 2015-12-31 2400178 21 2016-01-31 2400178 22 2016-02-29 2400178 23 2016-03-31 2400178 24 2016-04-30 2400178 25 2016-05-31 2400178 26 2016-06-30 2400178 27 2016-07-31 2400178 28 2016-08-31 2400178 29 2016-09-30 2400178 30 2016-10-31 2400178 31 2016-11-30 2400178 32 2016-12-31 2400178 33 2017-01-31 2400178 34 2017-02-28 2400178 35 2017-03-31 2400178 36 2017-04-30 2400178 37 2017-05-31 2400178 38 2017-06-30 2400178 39 2017-07-31 2400178 40 2017-08-31 2400178 41 2017-09-30 2400178 42 2017-10-31 2400178 43 2017-11-30 2400178 44 2017-12-31 2400178 45 2018-01-31 2400178 46 2018-02-28 2400178 47 2018-03-31 2400178 48 2018-04-30 2400178 49 2015-05-31 2400614 50 2015-06-30 2400614 51 2015-07-31 2400614 52 2015-08-31 2400614 53 2015-09-30 2400614 54 2015-10-31 2400614 55 2015-11-30 2400614 56 2015-12-31 2400614 57 2016-01-31 2400614 58 2016-02-29 2400614 59 2016-03-31 2400614标识零，并使用diff创建新索引cumsum。然后排除零行，并在ix中将值为head的{{1}}应用于2，以得到零后的前两行。最后by。

rbind

数据：

DF1 <- as.data.frame(cbind(DF,ix=cumsum(c(-1,diff(DF$value == 0)) %in% 1)))
DF1 <- DF1[DF1$value != 0,]
res <- do.call(rbind,by(DF1,DF1$ix,head,2))
head(res)
#           date country value id ix
# 0   2020-01-01       A   3.0 Cv  0
# 1.3 2020-01-03       A   3.0 Cv  1
# 1.4 2020-01-04       A   2.0 Cv  1
# 2.6 2020-01-06       A   3.0 Cv  2
# 2.7 2020-01-07       A   0.2 Cv  2
# 3   2020-01-10       A   4.0 Cv  3

使用R删除时间序列中连续零值之前的行

如何解决使用R删除时间序列中连续零值之前的行

解决方法

相关推荐