如何解决按行检查NA,然后汇总其列名
我目前正在努力逐行获取信息,这些信息的变数很好/为NA。下面,我显示了一部分数据的示例/输出输出。
head(big_test)
# A tibble: 3 x 19
id ctr_n ctr yr mn nvvi ENP_nat ENP_avg ENP_wght inflation1 inflation2 inflation3 inflation4 PSNS PSNS_s PSNS_w
<dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
1 1854 Isra~ 376 2019 4 1 3.50 3.50 3.50 NA NA NA NA NA 0.962 NA
2 1855 Isra~ 376 2019 9 1 2.51 2.51 2.51 NA NA NA NA NA 0.992 NA
3 1856 Isra~ 376 2020 3 1 3.78 3.78 3.78 NA NA NA NA NA 0.999 NA
# ... with 3 more variables: PSNS_sw <chr>,local_E <dbl>,cst_tot <dbl>
dput(big_test)
structure(list(id = c(1854,1855,1856),ctr_n = c("Israel","Israel","Israel"),ctr = c(376,376,376),yr = c(2019,2019,2020),mn = c(4,9,3),nvvi = c(1,1,1),ENP_nat = c(3.50348063163162,2.51319610127466,3.78468892335972),ENP_avg = c(3.50348063163162,ENP_wght = c(3.50348063163162,inflation1 = c("NA","NA","NA"),inflation2 = c("NA",inflation3 = c("NA",inflation4 = c("NA",PSNS = c("NA",PSNS_s = c(0.961748183147869,0.992275075925835,0.998547438416594),PSNS_w = c("NA",PSNS_sw = c("NA",local_E = c(1,cst_tot = c(1,1)),row.names = c(NA,-3L),class = c("tbl_df","tbl","data.frame"))
编辑:此处的NA用引号引起来,这是不合适的。我认为问题出在写.xlsx。正确的版本在下面的dput
输出中列出,NA上没有引号。
如您所见,数据在全国范围内通过选举分开,此处的每一行都应该是唯一的(即以色列,2019年,第4个月)。 我要创建一个字符列,其中列出了此输出中缺少的变量。这是所需列的示例:
desired_output <- tibble(missing_vars=paste("inflation1","inflation2","inflation3","inflation4","etc",sep=";"))
head(desired_output)
# A tibble: 1 x 1
missing_vars
<chr>
1 inflation1;inflation2;inflation3;inflation4;etc
因此,我想知道是否存在某种循环,可以对唯一的选举进行切片,查看丢失的列,然后对丢失的列进行矢量处理?这对于自动化至关重要,因为在存在相同国家/地区的其他变量的情况下,某些变量可能会丢失。我曾尝试对它们进行计数,但无法弄清楚如何将这些列名称列出为字符列。
感谢您的帮助。谢谢!
解决方法
这是tidyverse
的一个选项,在这里我们用pivot_longer
重塑为'long'格式,并按row_number()
,paste
分组,其中缺少的列名“值”中的值
library(dplyr)
library(tidyr)
library(stringr)
big_test %>%
select(starts_with('inflation')) %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn) %>%
group_by(rn) %>%
summarise(missing_vars = str_c(name[is.na(value)],collapse=";"),.groups = 'drop') %>%
select(-rn)
在不重塑的情况下,选项为rowwise/c_across
big_test %>%
rowwise %>%
transmute(missing_vars = str_c(names(select(cur_data(),starts_with('inflation')))[which(c_across(starts_with('inflation'))
== 'NA')],collapse=";"))
在这里,它将(==
)与“ NA”进行比较。如果它是真实的NA
,请使用is.na
代替==
@akrun指出,您使用的是“ NA”字符串而不是NA。解决该问题后,您可以定义一个缺少的函数并将其应用于每一行以创建一个新变量:
big_test <- structure(list(id = c(1854,1855,1856),ctr_n = c("Israel","Israel","Israel"),ctr = c(376,376,376),yr = c(2019,2019,2020),mn = c(4,9,3),nvvi = c(1,1,1),ENP_nat = c(3.50348063163162,2.51319610127466,3.78468892335972),ENP_avg = c(3.50348063163162,ENP_wght = c(3.50348063163162,inflation1 = c(NA,NA,NA),inflation2 = c(NA,inflation3 = c(NA,inflation4 = c(NA,PSNS = c(NA,PSNS_s = c(0.961748183147869,0.992275075925835,0.998547438416594),PSNS_w = c(NA,PSNS_sw = c(NA,local_E = c(1,cst_tot = c(1,1)),row.names = c(NA,-3L),class = c("tbl_df","tbl","data.frame"))
missing <- function(x) {
idx <- is.na(unlist(x))
paste(colnames(big_test)[idx],collapse=",")
}
big_test$missing <- apply(big_test,missing)
big_test$missing
#> [1] "inflation1,inflation2,inflation3,inflation4,PSNS,PSNS_w,PSNS_sw"
#> [2] "inflation1,PSNS_sw"
#> [3] "inflation1,PSNS_sw"
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。