如何解决修改R中的对象,使其兼容绘图 每个评论已更新:
我在R中找到了一个程序,该程序可以绘制数据集中的观测图。
PropertyWithAppliesTo<T>
对于上面生成的图,我想对其进行修改,以便在这些图上仅显示第一个观测值。我试图修改下面的代码
#source: https://cran.r-project.org/web/packages/xgboost/xgboost.pdf
library(xgboost)
data(agaricus.train,package='xgboost')
data(agaricus.test,package='xgboost')
bst <- xgboost(agaricus.train$data,agaricus.train$label,nrounds = 50,eta = 0.1,max_depth = 3,subsample = .5,method = "hist",objective = "binary:logistic",nthread = 2,verbose = 0)
xgb.plot.shap(agaricus.test$data,model = bst,features = "odor=none")
contr <- predict(bst,agaricus.test$data,predcontrib = TRUE)
xgb.plot.shap(agaricus.test$data,contr,top_n = 12,n_col = 3)
有人可以告诉我我做错了什么吗?还是这根本不可能? 谢谢
解决方法
据我所知,它并不是真的可以工作。这是一个示例-二进制变量的得分为0或1:得分为0 = SHAP值介于0.2和0.5之间,而得分为1 = SHAP值介于1.2和1.5之间-这是该图所说明的-该变量的SHAP值在0和1之间的差异。选择“第一个观察值”可能是得分为0或得分为1的观察值,因此显示的SHAP值并不能真正告诉您有关变量的太多信息。这就是为什么SHAP图需要矩阵包含多个观察值的原因(以及为什么您的方法行不通)。
尽管如此,如果您愿意,您可以拉取前n个观测值的SHAP值,然后在ggplot或基数R中将第一个观测值自己绘制出来,例如
library(tidyverse)
library(xgboost)
data(agaricus.train,package='xgboost')
data(agaricus.test,package='xgboost')
bst <- xgboost(agaricus.train$data,agaricus.train$label,nrounds = 50,eta = 0.1,max_depth = 3,subsample = .5,method = "hist",objective = "binary:logistic",nthread = 2,verbose = 0)
xgb.plot.shap(agaricus.test$data,model = bst,features = "odor=none")
contr <- predict(bst,agaricus.test$data,predcontrib = TRUE)
## Use "plot = FALSE" to return the data to "mat",instead of the rendered plot
mat <- xgb.plot.shap(agaricus.test$data[1:2,],contr[1:2,top_n = 12,n_col = 3,plot = FALSE)
## Format the data
mat$shap_contrib %>%
t() %>%
as.data.frame() %>%
rownames_to_column() %>%
set_names(c("Variable","SHAP","second_observation")) %>%
## Then plot however you want
ggplot(aes(y = SHAP,x = "")) +
geom_point(pch = 3) +
theme_bw() +
theme(axis.ticks.x = element_blank(),axis.title.x = element_blank()) +
facet_wrap(facets = vars(Variable))
每个评论已更新:
library(tidyverse)
library(xgboost)
data(agaricus.train,predcontrib = TRUE,approxcontrib = FALSE)
pred <- predict(bst,agaricus.test$data)
## Use "plot = FALSE" to return the data to "mat",plot = FALSE)
## Format the data
SHAP <- as.matrix(mat$shap_contrib[1,]) %>%
as.data.frame() %>%
rownames_to_column() %>%
set_names(c("Variable","SHAP"))
Score <- as.matrix(mat$data[1,]) %>%
as.data.frame() %>%
rownames_to_column() %>%
set_names(c("Variable","Score"))
Pred <- ifelse(pred[1] <= 0.5,1)
SHAP_Score <- left_join(SHAP,Score,by = "Variable")
SHAP_Score_Pred <- cbind(SHAP_Score,Pred)
ggplot(SHAP_Score_Pred,aes(y = SHAP,x = Score)) +
geom_hline(yintercept = 0,lty = 2,col = "grey75") +
geom_point(pch = 3,cex = 3,col = "red") +
ggtitle(label = paste("Prediction for this observation =",Pred,sep = " ")) +
theme_bw(base_size = 12) +
theme(axis.text = element_text(size = 14),axis.title = element_text(size = 16)) +
scale_x_continuous(breaks = c(0,1)) +
facet_wrap(facets = vars(Variable))
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。