如何调用Sagemaker XGBoost端点后期模型创建？

如何解决如何调用Sagemaker XGBoost端点后期模型创建？

我一直在关注这个在Medium（对本文底部使用的代码）上非常有用的XGBoost教程：https://medium.com/analytics-vidhya/random-forest-and-xgboost-on-amazon-sagemaker-and-aws-lambda-29abd9467795。

到目前为止，我已经能够获取用于ML的适当格式的数据，基于训练数据创建的模型，然后测试通过模型馈送的数据以得出有用的结果。

无论何时我离开并回到模型上做更多工作或提供新的测试数据时，我发现我都需要重新运行所有模型创建步骤，以便做出进一步的预测。相反，我只想基于Image_URI调用我已经创建的模型端点并输入新数据。

当前执行的步骤：

模型训练

>cmd set

评估

xgb = sagemaker.estimator.Estimator(containers[my_region],role,train_instance_count=1,train_instance_type='ml.m4.xlarge',output_path='s3://{}/{}/output'.format(bucket_name,prefix),sagemaker_session=sess)
xgb.set_hyperparameters(eta=0.06,alpha=0.8,lambda_bias=0.8,gamma=50,min_child_weight=6,subsample=0.5,silent=0,early_stopping_rounds=5,objective='reg:linear',num_round=1000)

xgb.fit({'train': s3_input_train})

xgb_predictor = xgb.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')

似乎这一行：

test_data_array = test_data.drop([ 'price','id','sqft_above','date'],axis=1).values #load the data into an array

xgb_predictor.serializer = csv_serializer # set the serializer type

predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!
predictions_array = np.fromstring(predictions[1:],sep=',') # and turn the prediction into an array
print(predictions_array.shape)

from sklearn.metrics import r2_score
print("R2 score : %.2f" % r2_score(test_data['price'],predictions_array))

需要重写，以便不引用xgb.predictor而是引用模型位置。

我尝试了以下

predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!

然后替换

trained_model = sagemaker.model.Model(
    model_data='s3://{}/{}/output/xgboost-2020-11-10-00-00/output/model.tar.gz'.format(bucket_name,image_uri='XXXXXXXXXX.dkr.ecr.us-east-1.amazonaws.com/xgboost:latest',role=role)  # your role here; could be different name

trained_model.deploy(initial_instance_count=1,instance_type='ml.m4.xlarge')

使用

xgb_predictor.serializer = csv_serializer # set the serializer type
predictions = xgb_predictor.predict(test_data_array).decode('utf-8') # predict!

但出现以下错误：

trained_model.serializer = csv_serializer # set the serializer type
predictions = trained_model.predict(test_data_array).decode('utf-8') # predict!

解决方法

这是一个好问题:)我同意，许多官方教程都倾向于显示完整的火车到调用管道，并且没有强调每个步骤都可以单独完成。在您的特定情况下，当您要调用已部署的端点时，可以：（A）在众多SDK之一（例如CLI，boto3中）中使用invoke API调用，或者（B）或使用高级Python SDK（通用的sagemaker.model.Model类或其特定于XGBoost的子实例：sagemaker.xgboost.model.XGBoostPredictor实例化predictor，如下所示：

from sagemaker.xgboost.model import XGBoostPredictor
    
predictor = XGBoostPredictor(endpoint_name='your-endpoint')
predictor.predict('<payload>')

类似问题How to use a pretrained model from s3 to predict some data?

注意：

如果您希望model.deploy()调用返回预测变量，则必须使用predictor_cls实例化模型。这是可选的，您也可以先部署模型，然后使用上述技术将其作为单独的步骤调用
即使您不调用端点，端点也会产生费用；他们按正常运行时间收费。因此，如果您不需要始终在线的端点，请毫不犹豫地将其关闭以最小化成本。

如何调用Sagemaker XGBoost端点后期模型创建？

如何解决如何调用Sagemaker XGBoost端点后期模型创建？

解决方法

相关推荐