如何解决R xgboost xgb.cv pred值:最佳迭代还是最终迭代?
我正在使用xgb.cv函数对xgboost的R实现中的最佳超参数进行网格搜索。当将预测设置为TRUE时,它会为非连续观察提供预测。假设您使用的是提前停止,那么预测是与最佳迭代的预测相对应还是最终迭代的预测?
解决方法
CV预测与最佳迭代相对应-您可以使用“严格的” early_stopping值来查看此值,然后将预测与使用“最佳”迭代次数和“最终”迭代次数训练的模型进行比较:
# Load minimum reproducible example
library(xgboost)
data(agaricus.train,package='xgboost')
data(agaricus.test,package='xgboost')
train <- agaricus.train
dtrain <- xgb.DMatrix(train$data,label=train$label)
test <- agaricus.test
dtest <- xgb.DMatrix(test$data,label=test$label)
# Perform cross validation with a 'strict' early_stopping
cv <- xgb.cv(data = train$data,label = train$label,nfold = 5,max_depth = 2,eta = 1,nthread = 4,nrounds = 10,objective = "binary:logistic",prediction = TRUE,early_stopping_rounds = 1)
# Check which round was the best iteration (the one that initiated the early stopping)
print(cv$best_iteration)
[1] 3
# Get the predictions
head(cv$pred)
[1] 0.84574515 0.15447612 0.15390711 0.84502697 0.09661318 0.15447612
# Train a model using 3 rounds (corresponds to best iteration)
trained_model <- xgb.train(data = dtrain,nrounds = 3,watchlist = list(train = dtrain,eval = dtrain),objective = "binary:logistic")
# Get predictions
head(predict(trained_model,dtrain))
[1] 0.84625006 0.15353635 0.15353635 0.84625006 0.09530514 0.15353635
# Train a model using 10 rounds (corresponds to final iteration)
trained_model <- xgb.train(data = dtrain,objective = "binary:logistic")
head(predict(trained_model,dtrain))
[1] 0.9884467125 0.0123147098 0.0050151693 0.9884467125 0.0008781737 0.0123147098
因此,来自CV的预测与迭代次数为“最佳”而不是“最终”时所做的预测相同。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。