如何解决使用 purrr::map() 将模型拟合到新的列表列时,如果拟合错误,则返回指示性字符串
我有要使用 purrr::map()
为每个组建模的数据。有时,对一个(或多个)子组拟合模型失败。例如,错误可能是 contrasts can be applied only to factors with 2 or more levels
或警告,例如 glm.fit: algorithm did not converge
或其他。
因为错误使整个代码失败,所以我想创建一个条件:如果子组存在拟合错误,则为该子组返回 "string-of-choice"
;但是对于使用模型产生结果的子组,它们返回模型对象。即使有关于收敛的警告,我也更喜欢使用 "string-of-choice-2"
而不是非收敛模型。
虽然我的问题很笼统,但我提供了一些玩具数据示例以供演示。
示例
这是一个生成数据的函数。在此数据中,3 列对应于人们回答的 3 个问题:
- 他们是否喜欢椰子(是或否,编码为
0
/1
。) - 他们是否喜欢茄子
- 他们是否喜欢西红柿
此外,我们还有一个 id
列和 gender
列。
在此数据的以下两个版本中,有关爱西红柿的列可以全部为NA
,也可以全部为0
。
generate_data <- function(x) {
data.frame(id = 1:2000,do_u_love_coconut = sample(c(0,1,NA),2000,replace = TRUE,prob = c(0.2,0.4,0.4)),do_u_love_eggplant = sample(c(0,prob = c(0.1,0.5,do_u_love_tomatoes = rep(x,2000),gender = sample(c("male","female"),replace = TRUE))
}
## generate the data
set.seed(2021)
df_tomatoes_is_NA <- generate_data(NA)
df_tomatoes_is_zero <- generate_data(0)
## preview the data
library(tibble)
as_tibble(df_tomatoes_is_NA)
## # A tibble: 2,000 x 5
## id do_u_love_coconut do_u_love_eggplant do_u_love_tomatoes gender
## <int> <dbl> <dbl> <lgl> <chr>
## 1 1 NA NA NA male
## 2 2 NA NA NA male
## 3 3 NA NA NA male
## 4 4 1 1 NA female
## 5 5 NA 1 NA female
## 6 6 NA NA NA male
## 7 7 NA NA NA female
## 8 8 1 1 NA male
## 9 9 0 1 NA female
## 10 10 0 1 NA female
## # ... with 1,990 more rows
as_tibble(df_tomatoes_is_zero)
## # A tibble: 2,000 x 5
## id do_u_love_coconut do_u_love_eggplant do_u_love_tomatoes gender
## <int> <dbl> <dbl> <dbl> <chr>
## 1 1 NA 0 0 male
## 2 2 NA NA 0 male
## 3 3 1 NA 0 female
## 4 4 0 1 0 female
## 5 5 1 0 0 male
## 6 6 NA 0 0 female
## 7 7 1 1 0 male
## 8 8 1 NA 0 male
## 9 9 1 NA 0 male
## 10 10 0 1 0 female
## # ... with 1,990 more rows
拟合模型
所以现在我想按性别拟合每个椰子/茄子/西红柿的模型。
- 数据的第 1 版
library(tidyr)
library(purrr)
library(dplyr)
df_tomatoes_is_NA %>%
pivot_longer(starts_with("do_u")) %>%
group_by(name) %>%
nest() %>%
mutate(fit_and_predict = map(data,~ glm(formula = value ~ gender,data = .x,family = "binomial")))
错误:mutate()
输入 fit_and_predict
有问题。
x 对比只能应用于具有 2 个或更多水平的因子
i 输入 fit_and_predict
是 map(data,~glm(formula = value ~ gender,family = "binomial"))
。
i 错误发生在第 3 组:name = "do_u_love_tomatoes"。
- 数据的第 2 版
df_tomatoes_is_zero %>%
pivot_longer(starts_with("do_u")) %>%
group_by(name) %>%
nest() %>%
mutate(fit_and_predict = map(data,family = "binomial")))
# A tibble: 3 x 3
# Groups: name [3]
name data fit_and_predict
<chr> <list> <list>
1 do_u_love_coconut <tibble [2,000 x 3]> <glm>
2 do_u_love_eggplant <tibble [2,000 x 3]> <glm>
3 do_u_love_tomatoes <tibble [2,000 x 3]> <glm>
警告信息:
mutate()
输入 fit_and_predict
有问题。
我 glm.fit:算法没有收敛
i 输入 fit_and_predict
是 map(data,family = "binomial"))
。
i 错误发生在第 3 组:name = "do_u_love_tomatoes"。
我的问题
我想考虑潜在的拟合错误并决定在此类失败中应该返回什么值。例如,对于诸如 contrasts can be applied only to factors with 2 or more levels
之类的错误,我希望将 "contrasts_error"
作为返回值。预期输出例如:
df_tomatoes_is_NA %>%
pivot_longer(starts_with("do_u")) %>%
group_by(name) %>%
nest() %>%
mutate(fit_and_predict = map(data,family = "binomial")))
## # A tibble: 3 x 3
## # Groups: name [3]
## name data fit_and_predict
## <chr> <list> <list>
## 1 do_u_love_coconut <tibble [2,000 x 3]> <glm>
## 2 do_u_love_eggplant <tibble [2,000 x 3]> <glm>
## 3 do_u_love_tomatoes <tibble [2,000 x 3]> <chr[1]> <-- "contrasts_error"
如果有收敛警告,例如 glm.fit: algorithm did not converge
,我希望:
df_tomatoes_is_zero %>%
pivot_longer(starts_with("do_u")) %>%
group_by(name) %>%
nest() %>%
mutate(fit_and_predict = map(data,000 x 3]> <chr[1]> <-- "convergence_warning"
编辑
澄清一下,由于存在许多潜在的拟合错误和警告,解决方案总是要求我在代码中指定每个潜在的错误/警告及其各自的字符串。我上面举了两个例子(对比错误和收敛警告)。
解决方法
您可以使用 tryCatch
处理此问题并捕获所有警告和错误并为它们返回相应的输出。
apply_glm <- function(data,formula) {
tryCatch(glm(formula = formula,data = data,family = "binomial"),error = function(e) {
if(e$message == "contrasts can be applied only to factors with 2 or more levels")
return('contrasts error')
},warning = function(w) {
if(w$message == "glm.fit: algorithm did not converge")
return('convergence warning')
})
}
您可以使用 if
/else if
或 case_when
语句扩展对错误和警告消息的处理。
将函数应用于数据集 df_tomatoes_is_NA
:
library(tidyverse)
df_tomatoes_is_NA %>%
pivot_longer(starts_with("do_u")) %>%
group_by(name) %>%
nest() %>%
mutate(fit_and_predict = map(data,apply_glm,value ~ gender)) -> result1
result1
# name data fit_and_predict
# <chr> <list> <list>
#1 do_u_love_coconut <tibble [2,000 × 3]> <glm>
#2 do_u_love_eggplant <tibble [2,000 × 3]> <glm>
#3 do_u_love_tomatoes <tibble [2,000 × 3]> <chr [1]>
result1$fit_and_predict
[[1]]
Call: glm(formula = value ~ gender,family = "binomial",data = data)
Coefficients:
(Intercept) gendermale
0.68837 -0.08838
Degrees of Freedom: 1214 Total (i.e. Null); 1213 Residual
(785 observations deleted due to missingness)
Null Deviance: 1564
Residual Deviance: 1564 AIC: 1568
[[2]]
Call: glm(formula = value ~ gender,data = data)
Coefficients:
(Intercept) gendermale
1.61343 -0.01754
Degrees of Freedom: 1218 Total (i.e. Null); 1217 Residual
(781 observations deleted due to missingness)
Null Deviance: 1101
Residual Deviance: 1101 AIC: 1105
[[3]]
[1] "contrasts error"
对于数据集 df_tomatoes_is_zero
:
df_tomatoes_is_zero %>%
pivot_longer(starts_with("do_u")) %>%
group_by(name) %>%
nest() %>%
mutate(fit_and_predict = map(data,apply_glm)) -> result2
result2
# name data fit_and_predict
# <chr> <list> <list>
#1 do_u_love_coconut <tibble [2,000 × 3]> <chr [1]>
result2$fit_and_predict
[[1]]
Call: glm(formula = value ~ gender,data = data)
Coefficients:
(Intercept) gendermale
0.49372 0.07442
Degrees of Freedom: 1190 Total (i.e. Null); 1189 Residual
(809 observations deleted due to missingness)
Null Deviance: 1570
Residual Deviance: 1570 AIC: 1574
[[2]]
Call: glm(formula = value ~ gender,data = data)
Coefficients:
(Intercept) gendermale
1.60539 -0.03636
Degrees of Freedom: 1177 Total (i.e. Null); 1176 Residual
(822 observations deleted due to missingness)
Null Deviance: 1073
Residual Deviance: 1073 AIC: 1077
[[3]]
[1] "convergence warning"
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。