用所选列的最小行数替换NA

如何解决用所选列的最小行数替换NA

假设我有一个包含几种类型的列(字符,数字,ID,时间等)的数据框。我将提供一个简单的示例,如下所示:

m <- data.frame(LETTERS[1:10],LETTERS[15:24],runif(10),runif(10))
x<-c("Col1","Col2","Col3","Col4","Col5","Col6","Col7")
colnames(m)<-x
m<-as.data.frame(lapply(m,function(x) x[ sample(c(TRUE,NA),prob = c(0.75,0.25),size = length(x),replace = TRUE) ]))

> m
   Col1 Col2       Col3       Col4       Col5       Col6       Col7
1     A    O 0.09929126 0.40435352 0.15360830 0.03830400 0.80157985
2     B    P 0.50314123 0.81725456         NA 0.07054851 0.65521042
3     C <NA> 0.75798665         NA 0.04483692 0.54671014         NA
4     D    R 0.96825047 0.01875140 0.07383107         NA 0.04498563
5  <NA>    S 0.47079716 0.04181401 0.21423046         NA 0.55493444
6     F <NA>         NA         NA         NA 0.33702657 0.54989260
7     G    U 0.71947656         NA         NA 0.99142181 0.69548691
8  <NA> <NA> 0.90518907 0.20661633 0.65788523 0.05534330 0.78420756
9     I    W 0.79208514 0.63233902         NA 0.72085080         NA
10    J    X 0.39093317 0.97107464         NA 0.86417719 0.39890170

对于Col3-Col7,如果NA少于3个,我想将其替换为Col3-Col7中的最小行,否则将NA保留在那里。因此,我希望数据集看起来如下:

> m
   Col1 Col2       Col3       Col4       Col5       Col6       Col7
1     A    O 0.09929126 0.40435352 0.15360830 0.03830400 0.80157985
2     B    P 0.50314123 0.81725456 0.07054851 0.07054851 0.65521042
3     C <NA> 0.75798665 0.04483692 0.04483692 0.54671014 0.04483692
4     D    R 0.96825047 0.01875140 0.07383107 0.01875140 0.04498563
5  <NA>    S 0.47079716 0.04181401 0.21423046 0.04181401 0.55493444
6     F <NA>         NA         NA         NA 0.33702657 0.54989260
7     G    U 0.71947656 0.69548691 0.69548691 0.99142181 0.69548691
8  <NA> <NA> 0.90518907 0.20661633 0.65788523 0.05534330 0.78420756
9     I    W 0.79208514 0.63233902 0.63233902 0.72085080 0.63233902
10    J    X 0.39093317 0.97107464 0.39093317 0.86417719 0.39890170

因此,第6行以外的每一行的值均由第3-7列的每一行的最小值估算。

在我的实际数据集中,对于列18:27之间的每一行,如果NA少于4,则用列18:27的最小行替换,否则保留所有NA。

我尝试使用dplyr管道/突变/替换方法,但是我不确定如何对一列列进行操作(我的印象是您只能使用突变/替换来指定一列) 。我尝试过的一些逻辑,包括在if语句中

rowSums(is.na(.[18:27]))<4 & rowSums(is.na(.[18:27]))>0)

我已经在matrixStats包中看到了rowMins函数,但是我只是想知道是否可以使用dplyr / dataframe而不是矩阵来做到这一点。

解决方法

我建议您使用一种tidyverse方法,其中您对数据进行整形并按Col1Col2进行分组,然后重新构建数据。在使用管道的同时,我们还可以使用mutate()创建新变量,并在创建Flag变量并计算最小值之后评估所需的条件。接下来的代码:

library(tidyverse)
#Data
m <- structure(list(Col1 = c("A","B","C","D","<NA>","F","G","I","J"),Col2 = c("O","P","R","S","U","W","X"),Col3 = c(0.09929126,0.50314123,0.75798665,0.96825047,0.47079716,NA,0.71947656,0.90518907,0.79208514,0.39093317),Col4 = c(0.40435352,0.81725456,0.0187514,0.04181401,0.20661633,0.63233902,0.97107464),Col5 = c(0.1536083,0.04483692,0.07383107,0.21423046,0.65788523,NA),Col6 = c(0.038304,0.07054851,0.54671014,0.33702657,0.99142181,0.0553433,0.7208508,0.86417719),Col7 = c(0.80157985,0.65521042,0.04498563,0.55493444,0.5498926,0.69548691,0.78420756,0.3989017)),class = "data.frame",row.names = c("1","2","3","4","5","6","7","8","9","10"))

代码:

#Reshape
m %>% pivot_longer(cols = -c(Col1,Col2)) %>%
  group_by(Col1,Col2) %>% mutate(MinVal=min(value,na.rm=T),Flag=sum(is.na(value))) %>% ungroup() %>%
  mutate(value=ifelse(is.na(value) & Flag<3,MinVal,value)) %>%
  select(-c(MinVal,Flag)) %>%
  pivot_wider(names_from = name,values_from=value)

输出:

# A tibble: 10 x 7
   Col1  Col2     Col3    Col4    Col5   Col6   Col7
   <chr> <chr>   <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
 1 A     O      0.0993  0.404   0.154  0.0383 0.802 
 2 B     P      0.503   0.817   0.0705 0.0705 0.655 
 3 C     <NA>   0.758   0.0448  0.0448 0.547  0.0448
 4 D     R      0.968   0.0188  0.0738 0.0188 0.0450
 5 <NA>  S      0.471   0.0418  0.214  0.0418 0.555 
 6 F     <NA>  NA      NA      NA      0.337  0.550 
 7 G     U      0.719   0.695   0.695  0.991  0.695 
 8 <NA>  <NA>   0.905   0.207   0.658  0.0553 0.784 
 9 I     W      0.792   0.632   0.632  0.721  0.632 
10 J     X      0.391   0.971   0.391  0.864  0.399 

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-