找出不同组内方差最大的变量 识别具有 (1) 同一组内的低方差和 (2) 明显、相对较大差异的变量的最佳方法是什么?

如何解决找出不同组内方差最大的变量 识别具有 (1) 同一组内的低方差和 (2) 明显、相对较大差异的变量的最佳方法是什么?

我正在尝试运行回归模型,我想在其中找到最佳预测变量。然而,数据包含超过 100,000 个变量(这是一个基于微阵列的实验)和 2 个结果(肿瘤和非肿瘤)。这是数据的快照:

>dt
       Tumor cg15560884   cg15979415 cg21482377 cg27346986 cg13565718  cg04359978 cg00328058 cg07787977 cg02632261
    1     No -0.2480779 -3.541635298  0.1930965 -0.2855506  1.7873570 -0.05663302 -0.3248885 -2.9448065  0.6228754
    2     No  0.9172439  0.055514083  0.4655855  0.3226286  2.0404916  1.93954213  1.0556121  0.1188842  0.6394047
    3     No  0.4115322 -2.688796456 -0.3414734 -0.5690240  1.4191325  0.23146577 -0.3843809 -2.3456532  2.1169214
    4     No  0.9564983 -0.284362579  0.9074372  0.3841181  1.9482238  2.30368166  1.6791506  0.1672436  1.8456432
    5     No  0.4373796 -0.847716026 -0.3654539  0.6407850  0.4981430  1.41772759  0.1211819 -0.3048053  0.9258699
    6     No  1.1842945  0.184265452 -0.5769016  0.2349631  0.4897802  1.96881485  2.1653989 -0.2689008 -0.7468990
    7     No  1.2802583  1.510712751 -0.4231337 -0.2016259  2.7457824  2.78922437  2.1925534  0.5288933  1.0394935
    8    Yes  1.1831400  0.103893327  0.3585823  0.5774059  2.9961775  3.00736681  1.9211571  1.9990507  2.5718125
    9    Yes  1.1419304  0.009477014  1.7524348  0.6827657  3.1542609  3.11282241  2.3964859  1.2965353  2.9299558
    10   Yes  1.5811014  2.612363269  4.0609050  2.5058440  3.4295390  2.74999398  3.5159891  4.0156051  4.1311138
    11   Yes  0.4145909 -0.375025614  0.2912988  0.3032374  3.1445856  2.20233921  0.8775737  0.8418369  1.9903667
    12   Yes  0.9668263 -0.272698105 -0.1731778  0.2230170  2.5546191  1.91083215  2.3383876  2.3296599  2.1821964
    13   Yes -0.1230484 -0.625187944  0.2620956 -0.0419292  2.9346895  2.45153644  1.9039218  1.5932535  2.4690055
    14   Yes  0.8659252  0.175015222  1.0062097  0.3605752  2.0769247  1.52875829  1.5361073  0.8493504  2.3467234

我已经进行了一系列的 logit 分析,这些分析似乎可以很好地执行。简而言之,这就是管道:

  1. 采用经验贝叶斯方法 (lmFit()) 的一系列数组 (eBayes()) 的线性模型:
#Designing the contrast model:
design <- data.frame(Tumor=c(rep("0",7),rep("1",7)),Benign=c(rep("1",rep("0",7)))

#Running linear regression with empirical Bayes approach:
lmFit(dt[,-1],design) %>% eBayes() -> fit
  1. 找到的候选人的单变量分析 (glm()) (P
glm(Tumor ~ [*],family=binomial(link='logit'),data=dt)
  1. 使用套索惩罚和交叉验证 (cv.glmnet()) 对找到的候选对象进行多变量分析 (P 1,000 名候选人):
#This is just an example (i.e. these are not necessarily the candidates).
x <- model.matrix(as.formula(c("~cg15560884 + cg15979415 + cg21482377 + cg27346986 + cg13565718 + cg04359978 + cg00328058 + cg07787977 + cg02632261")),dt)
y <- dt$Tumor

cv.lasso <- cv.glmnet(x,y,family="binomial",standardize=T,alpha=1,nfolds=10,data=dt)

在这一步,大约有 25 个候选通过了所有的分析。

当我绘制那些潜在的候选者时,我发现两组之间的差异(肿瘤 x 非肿瘤)对于其中一些人来说并不是那么好.由于这里的目的是寻找一些具有潜在临床相关性的候选人,因此问题如下:

识别具有 (1) 同一组内的低方差和 (2) 明显、相对较大差异的变量的最佳方法是什么?

我考虑过使用 Sum of squares between groups 或类似方法开始管道,但我不确定最好的方法是什么。

任何帮助将不胜感激。

PS:这是上面提供的模拟数据:

> dput(dt)
structure(list(Tumor = structure(c(1L,1L,2L,2L),.Label = c("No","Yes"),class = "factor"),cg15560884 = c(-0.248077910261345,0.917243931527906,0.411532204288758,0.956498270834689,0.437379596251596,1.18429454839675,1.28025825354934,1.18313995574906,1.14193044361971,1.58110142968133,0.414590861304658,0.966826317765609,-0.123048367553966,0.865925151474944),cg15979415 = c(-3.54163529762454,0.0555140832614802,-2.68879645560804,-0.284362579485303,-0.847716026488968,0.184265451680517,1.51071275115752,0.103893326861259,0.00947701421375391,2.61236326867269,-0.375025613783568,-0.272698105021754,-0.62518794357036,0.175015221908592),cg21482377 = c(0.19309650443158,0.465585470446969,-0.341473357667879,0.907437241260519,-0.36545393340153,-0.576901593963131,-0.423133690212283,0.358582311092238,1.75243481045574,4.06090501077815,0.291298796940103,-0.173177826492712,0.262095579024894,1.00620967854383),cg27346986 = c(-0.285550598277455,0.322628631286669,-0.569023984702458,0.384118141327422,0.640785025494895,0.234963083143655,-0.201625866484334,0.577405892650868,0.682765746636268,2.50584400408156,0.303237442139183,0.223016985745948,-0.0419291977006327,0.360575219255877),cg13565718 = c(1.78735699619159,2.04049160554018,1.41913246191327,1.94822376321837,0.498142972793614,0.489780156080168,2.74578238467836,2.99617752225596,3.15426093568808,3.42953897985088,3.14458563100296,2.55461911595142,2.93468952565516,2.07692465631039
    ),cg04359978 = c(-0.056633023447488,1.93954212965755,0.231465769910894,2.30368165948499,1.41772758917327,1.96881485075863,2.78922437354038,3.00736681386976,3.11282240855254,2.74999398469954,2.2023392119177,1.91083214812378,2.45153643876517,1.52875829347186),cg00328058 = c(-0.324888517874121,1.05561214264499,-0.384380901476841,1.67915058188375,0.121181903594745,2.16539894626557,2.19255343382776,1.9211570874648,2.39648589262158,3.51598910725581,0.877573664935301,2.33838763841662,1.90392175186133,1.53610733484261),cg07787977 = c(-2.94480653748164,0.118884228237407,-2.34565316735368,0.167243632510614,-0.304805256674929,-0.268900821807674,0.528893346493377,1.99905071007526,1.29653531766456,4.01560505952943,0.841836891217517,2.32965985284038,1.5932534862773,0.849350444895923),cg02632261 = c(0.622875369361909,0.63940465080861,2.11692137500146,1.84564321288801,0.925869893712036,-0.746898982059526,1.03949352765268,2.57181250869727,2.9299557731721,4.13111383622201,1.9903666807218,2.18219637136446,2.46900554766421,2.34672341454479)),row.names = c(NA,14L),class = "data.frame")

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-