从 svyglm() 模型获得协变量调整的患病率估计值

如何解决从 svyglm() 模型获得协变量调整的患病率估计值

所以我对使用调查数据计算协变量调整的流行率很感兴趣。我的问题与特定的专有数据集有关,但以下是使用 data(nhanes) 数据的示例。对于帖子的长度,我提前道歉......

第 1 步

我拟合了两个回归模型:(1) 一个原始模型和 (2) 一个调整后的模型

#Setup
if(!require (survey)) install.packages("survey")
library(survey)

#Read in data
data("nhanes")

#Create survey design
d.obj = svydesign(id = ~SDMVPSU,strata = ~SDMVSTRA,weights = ~WTMEC2YR,nest = TRUE,data = nhanes)

#Regression models
mod.1 = svyglm(formula = HI_CHOL ~ as.factor(agecat),design = d.obj,family = quasibinomial(link = "log"))

mod.2 = svyglm(formula = HI_CHOL ~ as.factor(agecat) + as.factor(race) + as.factor(RIAGENDR),family = quasibinomial(link = "log"))

第 2 步

我对系数求幂以获得流行率和 95% CI。下面的摘要显示了“NA”线上方的原始模型的 PR 和该线下方的调整模型的 PR。

#Regression Summary
mod.sum = rbind(cbind("Estimate" = exp(coef(mod.1)),exp(confint(mod.1))),c(NA,NA,NA),cbind("Estimate" = exp(coef(mod.2)),exp(confint(mod.2))))

> mod.sum
                              Estimate        2.5 %      97.5 %
(Intercept)                0.008660272  0.004735950  0.01583638
as.factor(agecat)(19,39]   9.109573963  4.818424917 17.22229550
as.factor(agecat)(39,59]  20.610647336 10.441625642 40.68320376
as.factor(agecat)(59,Inf] 17.932147454  9.227318351 34.84890193
                                    NA           NA          NA
(Intercept)                0.008597533  0.004623207  0.01598838
as.factor(agecat)(19,39]   9.051050139  4.816409068 17.00883531
as.factor(agecat)(39,59]  20.472879940 10.415264252 40.24274401
as.factor(agecat)(59,Inf] 17.641599353  9.050357148 34.38825923
as.factor(race)2           0.939905542  0.824583833  1.07135550
as.factor(race)3           0.693581683  0.536698646  0.89632339
as.factor(race)4           0.876065546  0.494248225  1.55284491
as.factor(RIAGENDR)2       1.210035838  1.048810926  1.39604451

第 3 步

按年龄类别对 HI_CHOL 的流行率估计相当简单...

#Calculting prevalence of HI_CHOL by agecat
prv = svyby(formula = ~HI_CHOL,by = ~agecat,FUN = svymean,vartype = "ci",na.rm = TRUE)

> prv
           agecat     HI_CHOL        ci_l       ci_u
(0,19]     (0,19] 0.008660267 0.003433241 0.01388729
(19,39]   (19,39] 0.078891392 0.061116023 0.09666676
(39,59]   (39,59] 0.178493821 0.156964219 0.20002342
(59,Inf] (59,Inf] 0.155297283 0.130664250 0.17993032

使用 prv 的点估计值,如果我使用 mod.1 作为参考组,我可以手动计算 agecat == (0,19] 的点估计值。但是,我希望计算每个年龄类别的协变量调整流行率(这样,如果我再次使用 mod.2 作为参考组,我可以类似地手动计算 agecat == (0,19] 的点估计值)。我不知道该怎么做。

在 Stata 中,我可以运行 margins 命令来产生这样的估计(或者我相信,但是当我在我的专有数据上运行该命令时,置信区间的下限是负的,我不是确定如何解释它们)。

. svyset SDMVPSU [pweight = WTMEC2YR],strata(SDMVSTRA) singleunit(centered)

      pweight: WTMEC2YR
          VCE: linearized
  Single unit: centered
     Strata 1: SDMVSTRA
         SU 1: SDMVPSU
        FPC 1: <zero>

. svy: glm HI_CHOL i.agecat i.race i.RIAGENDR,family(binomial) link(log) eform allbase
(running glm on estimation sample)

Survey: Generalized linear models

Number of strata   =        15                Number of obs     =        7,846
Number of PSUs     =        31                Population size   =  255,345,910
                                              Design df         =           16

------------------------------------------------------------------------------
             |             Linearized
     HI_CHOL |     exp(b)   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      agecat |
     (0,19]  |          1  (base)
    (19,39]  |   9.051065   2.917803     6.83   0.000     4.569875    17.92648
    (39,59]  |   20.47291   7.054992     8.76   0.000      9.86093    42.50513
   (59,Inf]  |   17.64163   5.994874     8.45   0.000     8.583895    36.25709
             |
        race |
          1  |          1  (base)
          2  |   .9399055   .0616344    -0.95   0.359     .8179215    1.080082
          3  |   .6935817   .0908676    -2.79   0.013     .5253873    .9156208
          4  |   .8760655   .2516647    -0.46   0.651     .4764973    1.610693
             |
    RIAGENDR |
          1  |          1  (base)
          2  |   1.210036   .0872042     2.65   0.018       1.0386    1.409769
             |
       _cons |   .0085975   .0027195   -15.04   0.000     .0043969    .0168111
------------------------------------------------------------------------------

. margins agecat,expression(exp(predict(xb))) vce(unconditional)

Predictive margins

Number of strata   =        15                Number of obs     =        7,910
                                              Design df         =           16

Expression   : exp(predict(xb))

------------------------------------------------------------------------------
             |             Linearized
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      agecat |
     (0,19]  |   .0087239    .002677     3.26   0.005     .0030489    .0143989
    (19,39]  |   .0789607    .009204     8.58   0.000      .059449    .0984724
    (39,59]  |   .1786039   .0107848    16.56   0.000     .1557412    .2014666
   (59,Inf]  |    .153904   .0129656    11.87   0.000     .1264183    .1813898
------------------------------------------------------------------------------

附加上下文

在我上面的 nhanes 示例中,原始模型和调整模型之间的点估计值没有显着差异。然而,对于我的专有数据,类似的“原始”模型更像是一个“最小调整模型”,它采用 outcome ~ exposure + covar 的形式,其中 covar 是一个结构协变量,在设置调查设计。如果我按传统方式按暴露组计算结果患病率,而不考虑 covar,那么得出的估计值将没有意义。鉴于此,我知道如何计算患病率的唯一方法是基于回归模型(就像我在 Stata 中所做的那样),但我不知道如何在 R 中执行此操作,并且非常想学习。

更新 3/21/2021

所以我相信我已经通过代码解决了这个问题,但是解释结果让我摸不着头脑。简而言之,我可以按曝光分组,并对协变量的每个排列的指数模型拟合值求和。我能够使用 nhanes 执行此操作,如下所示:

#Setup
library(survey)
library(magrittr)
library(dplyr)
library(tidyverse)
library(broom)

#Read in data
data("nhanes")

#Factoring covariates
nhanes$race %<>% factor()
nhanes$RIAGENDR %<>% factor()

#Create survey design
d.obj = svydesign(id = ~SDMVPSU,data = nhanes)

#Regression models
mod.1 = svyglm(formula = HI_CHOL ~ agecat,family = quasibinomial(link = "log"))

mod.2 = svyglm(formula = HI_CHOL ~ agecat + race + RIAGENDR,family = quasibinomial(link = "log"))

#Regression Summary
mod.sum =
  list(Model.1 = data.frame(tidy(mod.1,exponentiate = TRUE,conf.int = TRUE)),Model.2 = data.frame(tidy(mod.2,conf.int = TRUE)))

#Appending regression fitted values to data frame
df.1 = augment(mod.1)
df.2 = augment(mod.2)

#Estimating probabilities for each permutation of regression formula
x.prob.1 = aggregate(cbind(est= exp(.fitted)) ~ agecat,data = df.1,FUN = mean)
x.prob.2 = aggregate(cbind(est= exp(.fitted)) ~ race + RIAGENDR + agecat,data = df.2,FUN = mean)

#Estimating probability of outcome given exposure,independant of covars
tab.x.1 = aggregate(est~ agecat,data = x.prob.1,FUN = sum)
tab.x.1 %<>% mutate(rr = est/est[1])

tab.x.2 = aggregate(est~ agecat,data = x.prob.2,FUN = sum)
tab.x.2 %<>% mutate(rr = est/est[1])

#Summarizing tab.x
tab.x.sum = list(Model.1 = tab.x.1,Model.2 = tab.x.2)

#Estimating prevalence through traditional method
prev = svyby(formula = ~HI_CHOL,na.rm = TRUE)

汇总对象的结果如下...

> mod.sum
$Model.1
            term     estimate std.error  statistic      p.value    conf.low   conf.high
1    (Intercept)  0.008660272 0.3079463 -15.421547 9.802627e-10  0.00473595  0.01583638
2  agecat(19,39]  9.109573963 0.3249442   6.799094 1.264216e-05  4.81842492 17.22229550
3  agecat(39,59] 20.610647336 0.3469490   8.721189 8.585730e-07 10.44162564 40.68320376
4 agecat(59,Inf] 17.932147454 0.3389994   8.515045 1.122087e-06  9.22731835 34.84890193

$Model.2
            term     estimate  std.error   statistic      p.value     conf.low   conf.high
1    (Intercept)  0.008597533 0.31652962 -15.0263347 1.111045e-07  0.004623207  0.01598838
2  agecat(19,39]  9.051050139 0.32186925   6.8440237 7.523606e-05  4.816409068 17.00883531
3  agecat(39,59] 20.472879940 0.34481686   8.7556654 1.068803e-05 10.415264252 40.24274401
4 agecat(59,Inf] 17.641599353 0.34054477   8.4284357 1.455588e-05  9.050357148 34.38825923
5          race2  0.939905542 0.06678723  -0.9279602 3.776433e-01  0.824583833  1.07135550
6          race3  0.693581683 0.13083519  -2.7965432 2.082994e-02  0.536698646  0.89632339
7          race4  0.876065546 0.29204774  -0.4530573 6.612323e-01  0.494248225  1.55284491
8      RIAGENDR2  1.210035838 0.07295691   2.6131862 2.812343e-02  1.048810926  1.39604451

> tab.x.sum
$Model.1
    agecat         est        rr
1   (0,19] 0.008660272  1.000000
2  (19,39] 0.078891392  9.109574
3  (39,59] 0.178493821 20.610647
4 (59,Inf] 0.155297283 17.932147

$Model.2
    agecat       est       rr
1   (0,19] 0.0666845  1.00000
2  (19,39] 0.6035648  9.05105
3  (39,59] 1.3652238 20.47288
4 (59,Inf] 1.1764213 17.64160

> prev
           agecat     HI_CHOL          se
(0,19] 0.008660267 0.002666899
(19,39] 0.078891392 0.009069233
(39,59] 0.178493821 0.010984693
(59,Inf] 0.155297283 0.012568105

这种方法存在一些(如果不是很多)问题。首先,来自 R 的对数二项式点估计与来自 STATA 的不一致,尽管这可能归因于潜在的程序差异。其次,对于 R,虽然流行率估计值及其相应的流行率比率相加,但调整后模型的估计值没有意义。例如,tab.x.sum$Model.2 的流行率估计值高于 1.00!这让我相信,虽然我已经在数学/代码方面解决了我的问题,但我产生的估计量与我想象的完全不同。

对我实际计算的任何见解都会有所帮助...!

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-