如何解决如何在R中透视更宽的NoSQL数据
我正在处理NoSQL数据,需要在R中进行透视。
样本数据:
structure(list(timestamp = structure(c(1595709882,1595709882,1595709931,1595710021,1595710023,1595710027,1595710157,1595710277,1595710337,1595710397,1595710457,1595710517,1595710517
),class = c("POSIXct","POSIXt"),tzone = "UTC"),value = c("3000","160","3000","6000","160"),variable = c("ENGINE_RPM","VEHICLE_SPEED","ENGINE_RPM","VEHICLE_SPEED")),row.names = c(NA,-20L),class = c("tbl_df","tbl","data.frame"))
timestamp value variable
7/25/2020 20:44:42 3000 ENGINE_RPM
7/25/2020 20:44:42 160 VEHICLE_SPEED
7/25/2020 20:45:31 160 VEHICLE_SPEED
7/25/2020 20:45:31 3000 ENGINE_RPM
7/25/2020 20:47:01 6000 ENGINE_RPM
7/25/2020 20:47:03 6000 ENGINE_RPM
7/25/2020 20:47:03 160 VEHICLE_SPEED
7/25/2020 20:47:07 6000 ENGINE_RPM
7/25/2020 20:49:17 6000 ENGINE_RPM
7/25/2020 20:49:17 160 VEHICLE_SPEED
7/25/2020 20:51:17 160 VEHICLE_SPEED
7/25/2020 20:51:17 6000 ENGINE_RPM
7/25/2020 20:52:17 6000 ENGINE_RPM
7/25/2020 20:52:17 160 VEHICLE_SPEED
7/25/2020 20:53:17 6000 ENGINE_RPM
7/25/2020 20:53:17 160 VEHICLE_SPEED
7/25/2020 20:54:17 6000 ENGINE_RPM
7/25/2020 20:54:17 160 VEHICLE_SPEED
7/25/2020 20:55:17 6000 ENGINE_RPM
7/25/2020 20:55:17 160 VEHICLE_SPEED
如果我们查看示例数据,则某些时间戳具有RPM和SPEED,而很少的时间戳仅具有其中之一。
我需要具有2个时间戳的行,因为它们同时具有车速和RPM,以后可以在特定时间查看以了解车速和引擎RPM。
我正在查看的输出是:
timestamp ENGINE_RPM VEHICLE_SPEED
7/25/2020 20:44:42 3000 160
7/25/2020 20:45:31 3000 160
7/25/2020 20:47:03 6000 160
7/25/2020 20:49:17 6000 160
7/25/2020 20:51:17 6000 160
7/25/2020 20:52:17 6000 160
7/25/2020 20:53:17 6000 160
7/25/2020 20:54:17 6000 160
7/25/2020 20:55:17 6000 160
我使用的查询是:
data %>% group_by(timestamp,variable,value) %>%
mutate(row = row_number()) %>% filter(n() == 2) %>%
pivot_wider(names_from = variable,values_from = value) %>% select(-row)
我得到的输出是:
# A tibble: 8 x 3
# Groups: timestamp [4]
timestamp VEHICLE_SPEED ENGINE_RPM
<dttm> <chr> <chr>
1 2020-08-05 16:09:02 5 NA
2 2020-08-05 16:09:02 5 NA
3 2020-08-06 18:32:33 15 NA
4 2020-08-06 18:32:33 15 NA
5 2020-08-06 18:32:52 25 NA
6 2020-08-06 18:32:52 25 NA
7 2020-08-07 12:03:53 NA 1500
8 2020-08-07 12:03:53 NA 1500
>
有人能让我知道如何获得所需的输出。
解决方法
使用na.omit
后,可以使用pivot_wider
函数来使数据更宽:
dat %>%
pivot_wider(names_from = variable,values_from = value) %>%
na.omit()
timestamp ENGINE_RPM VEHICLE_SPEED
<dttm> <chr> <chr>
1 2020-07-25 20:44:42 3000 160
2 2020-07-25 20:45:31 3000 160
3 2020-07-25 20:47:03 6000 160
4 2020-07-25 20:49:17 6000 160
5 2020-07-25 20:51:17 6000 160
6 2020-07-25 20:52:17 6000 160
7 2020-07-25 20:53:17 6000 160
8 2020-07-25 20:54:17 6000 160
9 2020-07-25 20:55:17 6000 160
,
您可以尝试
library(tidyr)
library(dplyr)
df2 <- df %>%
distinct(.) %>%
pivot_wider(names_from = variable,values_from = value) %>%
filter(!is.na(VEHICLE_SPEED))
或
df2 <- df %>%
distinct(.) %>%
spread(variable,value) %>%
filter(!is.na(VEHICLE_SPEED))
# timestamp ENGINE_RPM VEHICLE_SPEED
# <dttm> <chr> <chr>
# 1 2020-07-25 20:44:42 3000 160
# 2 2020-07-25 20:45:31 3000 160
# 3 2020-07-25 20:47:03 6000 160
# 4 2020-07-25 20:49:17 6000 160
# 5 2020-07-25 20:51:17 6000 160
# 6 2020-07-25 20:52:17 6000 160
# 7 2020-07-25 20:53:17 6000 160
# 8 2020-07-25 20:54:17 6000 160
# 9 2020-07-25 20:55:17 6000 160
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。