如何解决使用 R 根据时间戳拆分数据框中的行
我有以下非结构化票务数据集,其中包含工作笔记更新。每张票都有多个基于时间戳的工作笔记。我需要拆分工作笔记列,每一行都有时间戳及其对应的更新,类似于预期输出
I.NO Ticket No: Worknotes
0 198822 2015-06-19 01:57:11 -Account Service
1 198822 Event closed
2 198822 Acknowledged
3 198822 2015-06-19 01:58:33- Lawrence David
4 198822 Data unavialable and hence ticket closed
5 198824 2015-06-19 02:07:01- Account Service
6 198824 User requested for database information
7 198824 2015-06-19 02:07:34- Cecilia Trandau
8 198824 Backup in progress. Under discusion
9 198824 2015-06-20 02:07:01- Account Service
10 198824 Auto closed
########## Edited **Output of dput**
structure(list(I.NO = c(0,1,2,3,4,5,6,7,8,9,10),`Ticket No:` = c(198822,198822,198824,198824),Worknotes = c("2015-06-19 01:57:11 -Account Service","Event closed","Acknowledged","2015-06-19 01:58:33- Lawrence David","Data unavialable and hence ticket closed","2015-06-19 02:07:01- Account Service","User requested for database information","2015-06-19 02:07:34- Cecilia Trandau","Backup in progress. Under discusion","2015-06-20 02:07:01- Account Service","Auto closed")),row.names = c(NA,-11L),class = c("tbl_df","tbl","data.frame"))
# A tibble: 6 x 3
I.NO `Ticket No:` Worknotes
<dbl> <dbl> <chr>
1 0 198822 2015-06-19 01:57:11 -Account Service
2 1 198822 Event closed
3 2 198822 Acknowledged
4 3 198822 2015-06-19 01:58:33- Lawrence David
5 4 198822 Data unavialable and hence ticket closed
6 5 198824 2015-06-19 02:07:01- Account Service
###########################
**Expected Output**
**Ticket No:** **Worknotes**
198822 2015-06-19 01:57:11 -Account Service
Event closed
Acknowledge
198822 2015-06-19 01:58:33- Lawrence David
Data unavailable and hence ticket closed
198824 2015-06-19 02:07:01- Account Service
User requested for database information
198824 2015-06-19 02:07:34- Cecilia Trandau
Backup in progress. Under discusion
198824 2015-06-20 02:07:01- Account Service
Auto closed
解决方法
以下是对 cumsum
和 str_detect
进行分组的方法:
library(tidyverse)
data %>%
mutate(grouper = cumsum(str_detect(Worknotes,"^[0-9\\-]{10}")))
# A tibble: 11 x 4
I.NO `Ticket No:` Worknotes grouper
<dbl> <dbl> <chr> <int>
1 0 198822 2015-06-19 01:57:11 -Account Service 1
2 1 198822 Event closed 1
3 2 198822 Acknowledged 1
4 3 198822 2015-06-19 01:58:33- Lawrence David 2
5 4 198822 Data unavialable and hence ticket closed 2
6 5 198824 2015-06-19 02:07:01- Account Service 3
7 6 198824 User requested for database information 3
8 7 198824 2015-06-19 02:07:34- Cecilia Trandau 4
9 8 198824 Backup in progress. Under discusion 4
10 9 198824 2015-06-20 02:07:01- Account Service 5
11 10 198824 Auto closed 5
从这里,我们可以group_by
、summarise
和paste
:
data %>%
mutate(grouper = cumsum(str_detect(Worknotes,"^[0-9\\-]{10}"))) %>%
group_by(`Ticket No:`,grouper) %>%
summarise(Worknotes = paste(Worknotes,collapse = "\n")) %>%
select(-grouper) -> result
result
`Ticket No:` Worknotes
<dbl> <chr>
1 198822 "2015-06-19 01:57:11 -Account Service\nEvent closed\nAcknowledged"
2 198822 "2015-06-19 01:58:33- Lawrence David\nData unavialable and hence ticket closed"
3 198824 "2015-06-19 02:07:01- Account Service\nUser requested for database information"
4 198824 "2015-06-19 02:07:34- Cecilia Trandau\nBackup in progress. Under discusion"
5 198824 "2015-06-20 02:07:01- Account Service\nAuto closed"
请注意,\n
在 R 中不与 print()
解析,但它与 cat()
解析:
cat(as.matrix(result[1,2]))
2015-06-19 01:57:11 -Account Service
Event closed
Acknowledged
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。