如何解决使用dplyr从每个组的另一列中识别出另一个日期之前的最新日期?
我有一个像这样的数据框:
example <- data.frame(id = c(1,1,2,3,3),delivereddate = c("7/20/2019","7/24/2019","7/28/2019","3/24/2019","4/13/2019","4/25/2019","11/13/2019","11/20/2019","11/27/2019"),applieddate = c("7/22/2019","7/22/2019",NA,"11/21/2019","11/21/2019"))
我正在尝试添加一列,以标识每个id的applydate之前的最新deliverydate。我试图获得最终结果的示例如下:
desiredresult <- data.frame(id = c(1,"11/21/2019"),applied = c(1,0))
我需要应用的列为二进制(0或1),并且每个id只能有1行带有1标志。如果id没有applydate,则所有行的apply标志为0。
解决方法
我们可以使用findInterval
library(dplyr)
library(lubridate)
example %>%
dplyr::group_by(id) %>%
dplyr::mutate(applied = +(row_number() %in%
findInterval(lubridate::mdy(first(applieddate)),lubridate::mdy(delivereddate))))
# A tibble: 9 x 4
# Groups: id [3]
# id delivereddate applieddate applied
# <dbl> <chr> <chr> <int>
#1 1 7/20/2019 7/22/2019 1
#2 1 7/24/2019 7/22/2019 0
#3 1 7/28/2019 7/22/2019 0
#4 2 3/24/2019 <NA> 0
#5 2 4/13/2019 <NA> 0
#6 2 4/25/2019 <NA> 0
#7 3 11/13/2019 11/21/2019 0
#8 3 11/20/2019 11/21/2019 1
#9 3 11/27/2019 11/21/2019 0
,
您可以将列转换为日期类,从applieddate
中减去delivereddate
并取绝对值。然后,对于每个id
,我们将1分配给观察到最小差异的索引。
library(dplyr)
example %>%
mutate(across(ends_with('date'),lubridate::mdy),applied = abs(delivereddate - applieddate)) %>%
group_by(id) %>%
mutate(applied = +(row_number() %in% which.min(applied)))
# id delivereddate applieddate applied
# <dbl> <date> <date> <int>
#1 1 2019-07-20 2019-07-22 1
#2 1 2019-07-24 2019-07-22 0
#3 1 2019-07-28 2019-07-22 0
#4 2 2019-03-24 NA 0
#5 2 2019-04-13 NA 0
#6 2 2019-04-25 NA 0
#7 3 2019-11-13 2019-11-21 0
#8 3 2019-11-20 2019-11-21 1
#9 3 2019-11-27 2019-11-21 0
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。