如何解决tidymodels-使用step_ns
我正在尝试编写一个适合使用step_ns()的配方重新采样的函数。由于某种原因,我收到错误消息:
Fold01: recipe: Error: Not all variables in the recipe are present in the supplied training set
,以此类推。然后
警告信息:
All models failed in [fit_resamples()]. See the
.notes
column.
这是我的代码:
compare_basis_exp_to_base_mod <- function (data,outcome,metric,...) {
outcome <- rlang::enquo(outcome)
metric <- rlang::enquo(metric)
pred_list <- colnames(data)
outcome_str <- substring(deparse(substitute(outcome)),2)
outcome_str_id <- which(colnames(data) %in% outcome_str)
predictor <- pred_list[-outcome_str_id]
data <- data %>%
rename(prediction = !!outcome)
res <- tibble(splits = list(),id = character(),.metrics = list(),.notes = list(),.predictions = list(),pred = character())
rec_without_splines <- recipe(prediction ~ .,data = data) %>%
prep()
rec_with_splines <- recipe(prediction ~ .,data = data) %>%
step_ns(all_predictors(),...) %>%
prep()
folds_without_splines <- vfold_cv(juice(rec_without_splines),strata = prediction)
folds_with_splines <- vfold_cv(juice(rec_with_splines),strata = prediction)
mod <- linear_reg() %>%
set_engine("lm")
mod_without_splines <- fit_resamples(mod,rec_without_splines,folds_without_splines,metrics = metric_set(!!metric),control = control_resamples(save_pred = TRUE)) %>%
mutate(pred = "no_splines")
mod_with_splines <- fit_resamples(mod,rec_with_splines,folds_with_splines,control = control_resamples(save_pred = TRUE)) %>%
mutate(pred = "with_splines")
res <- mod_without_splines %>%
bind_rows(mod_with_splines)
return (res)
}
基本上,参数data
包含两列表,而outcome
是结果列的名称。除了使用此功能(我刚刚接触tidymodels之外)时,我只想了解是什么导致了此错误以及如何修复该错误。计算mod_with_splines
时出错。
here遇到了类似的问题。但是我不知道这是否与我的问题有关。在将食谱传递给fit_resamples
之前,我无法准备食谱。 (或者我认为)
任何帮助将不胜感激。谢谢。
解决方法
您的问题来自试图在已经通过同一配方运行的数据集上应用配方。
如果我们假设预测变量为X1
和X2
,那么rec_with_splines
应该是这些变量。但是,由于folds_with_splines
包含rec_with_splines
的榨汁结果,因此folds_with_splines
实际上包含X1_ns_1
,X1_ns_2
,X2_ns_1
和X2_ns_2
。不是X1
和X2
。
我建议使用workflows来组合预处理和建模步骤。并将原始数据传递到vfold_cv()
。
library(tidymodels)
compare_basis_exp_to_base_mod <- function (data,outcome,metric,...) {
outcome <- rlang::enquo(outcome)
metric <- rlang::enquo(metric)
pred_list <- colnames(data)
outcome_str <- substring(deparse(substitute(outcome)),2)
outcome_str_id <- which(colnames(data) %in% outcome_str)
predictor <- pred_list[-outcome_str_id]
data <- data %>%
rename(prediction = !!outcome)
rec_without_splines <- recipe(prediction ~ .,data = data) %>%
prep()
rec_with_splines <- recipe(prediction ~ .,data = data) %>%
step_ns(all_predictors(),...)
mod <- linear_reg() %>%
set_engine("lm")
wf_without_splines <- workflow() %>%
add_recipe(rec_without_splines) %>%
add_model(mod)
wf_with_splines <- workflow() %>%
add_recipe(rec_with_splines) %>%
add_model(mod)
data_folds <- vfold_cv(data,strata = prediction)
mod_without_splines <- fit_resamples(wf_without_splines,data_folds,metrics = metric_set(!!metric),control = control_resamples(save_pred = TRUE)) %>%
mutate(pred = "no_splines")
mod_with_splines <- fit_resamples(wf_with_splines,control = control_resamples(save_pred = TRUE)) %>%
mutate(pred = "with_splines")
res <- mod_without_splines %>%
bind_rows(mod_with_splines)
return (res)
}
res <- compare_basis_exp_to_base_mod(mtcars,mpg,rmse)
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。