如何解决根据R中的规则从data.frame中删除行
在data.frame中,如果在所有其他列中都有另一行具有相同信息的行,我想自动删除Column_E
上带有'NA'的行,例如:
Column_A Column_B Column_C Column_D Column_E
A121 NAME1 A321 2019-01-01 NA
A121 NAME1 A321 2019-01-01 2020-02-01
A123 NAME2 A322 2019-01-01 2020-01-01
A123 NAME2 A322 2019-01-01 NA
A124 NAME3 A323 2019-01-01 2019-01-01
A124 NAME4 A324 2019-01-01 NA
输出应为:
Column_A Column_B Column_C Column_D Column_E
A121 NAME1 A321 2019-01-01 2020-02-01
A123 NAME2 A322 2019-01-01 2020-01-01
A124 NAME3 A323 2019-01-01 2019-01-01
A124 NAME4 A324 2019-01-01 NA
有什么想法吗?
解决方法
您可以选择没有NA
值或组中只有1行的行。
library(dplyr)
df %>%
group_by(across(Column_A:Column_D)) %>%
filter(!is.na(Column_E) | n() == 1)
# Column_A Column_B Column_C Column_D Column_E
# <chr> <chr> <chr> <chr> <chr>
#1 A121 NAME1 A321 2019-01-01 2020-02-01
#2 A123 NAME2 A322 2019-01-01 2020-01-01
#3 A124 NAME3 A323 2019-01-01 2019-01-01
#4 A124 NAME4 A324 2019-01-01 NA
data.table
中的逻辑相同:
library(data.table)
setDT(df)
df[,.SD[!is.na(Column_E) | .N == 1],.(Column_A,Column_B,Column_C,Column_D)]
和基数R:
subset(df,ave(!is.na(Column_E),Column_A,Column_D,FUN = function(x) x | length(x) == 1))
数据
df <- structure(list(Column_A = c("A121","A121","A123","A124","A124"),Column_B = c("NAME1","NAME1","NAME2","NAME3","NAME4"),Column_C = c("A321","A321","A322","A323","A324"),Column_D = c("2019-01-01","2019-01-01","2019-01-01"),Column_E = c(NA,"2020-02-01","2020-01-01",NA,NA)),class = "data.frame",row.names = c(NA,-6L))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。