本地Jupyter笔记本中的SageMaker:无法使用AWS托管的XGBoost容器“ KeyError:'S3DistributionType'”和“无法运行:['docker-compose'”

如何解决本地Jupyter笔记本中的SageMaker:无法使用AWS托管的XGBoost容器“ KeyError:'S3DistributionType'”和“无法运行:['docker-compose'”

在本地Jupyter笔记本中运行SageMaker(使用VS Code)可以正常工作,除了尝试使用AWS托管容器训练XGBoost模型会导致错误(容器名称:246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3)之外。

Jupyter笔记本电脑

import sagemaker

session = sagemaker.LocalSession()

# Load and prepare the training and validation data
...

# Upload the training and validation data to S3
test_location = session.upload_data(os.path.join(data_dir,'test.csv'),key_prefix=prefix)
val_location = session.upload_data(os.path.join(data_dir,'validation.csv'),key_prefix=prefix)
train_location = session.upload_data(os.path.join(data_dir,'train.csv'),key_prefix=prefix)

region = session.boto_region_name
instance_type = 'ml.m4.xlarge'
container = sagemaker.image_uris.retrieve('xgboost',region,'1.0-1','py3',instance_type=instance_type)

role = 'arn:aws:iam::<USER ID #>:role/service-role/AmazonSageMaker-ExecutionRole-<ROLE ID #>'

xgb_estimator = sagemaker.estimator.Estimator(
    container,role,train_instance_count=1,train_instance_type=instance_type,output_path=f's3://{session.default_bucket()}/{prefix}/output',sagemaker_session=session)

xgb_estimator.set_hyperparameters(max_depth=5,eta=0.2,gamma=4,min_child_weight=6,subsample=0.8,objective='reg:squarederror',early_stopping_rounds=10,num_round=200)

s3_input_train = sagemaker.inputs.TrainingInput(s3_data=train_location,content_type='csv')
s3_input_validation = sagemaker.inputs.TrainingInput(s3_data=val_location,content_type='csv')

xgb_estimator.fit({'train': s3_input_train,'validation': s3_input_validation})

Docker Container KeyError

algo-1-tfcvc_1  | ERROR:sagemaker-containers:Reporting training FAILURE
algo-1-tfcvc_1  | ERROR:sagemaker-containers:framework error: 
algo-1-tfcvc_1  | Traceback (most recent call last):
algo-1-tfcvc_1  |   File "/miniconda3/lib/python3.6/site-packages/sagemaker_containers/_trainer.py",line 84,in train
algo-1-tfcvc_1  |     entrypoint()
algo-1-tfcvc_1  |   File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/training.py",line 94,in main
algo-1-tfcvc_1  |     train(framework.training_env())
algo-1-tfcvc_1  |   File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/training.py",line 90,in train
algo-1-tfcvc_1  |     run_algorithm_mode()
algo-1-tfcvc_1  |   File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/training.py",line 68,in run_algorithm_mode
algo-1-tfcvc_1  |     checkpoint_config=checkpoint_config
algo-1-tfcvc_1  |   File "/miniconda3/lib/python3.6/site-packages/sagemaker_xgboost_container/algorithm_mode/train.py",line 115,in sagemaker_train
algo-1-tfcvc_1  |     validated_data_config = channels.validate(data_config)
algo-1-tfcvc_1  |   File "/miniconda3/lib/python3.6/site-packages/sagemaker_algorithm_toolkit/channel_validation.py",line 106,in validate
algo-1-tfcvc_1  |     channel_obj.validate(value)
algo-1-tfcvc_1  |   File "/miniconda3/lib/python3.6/site-packages/sagemaker_algorithm_toolkit/channel_validation.py",line 52,in validate
algo-1-tfcvc_1  |     if (value[CONTENT_TYPE],value[TRAINING_INPUT_MODE],value[S3_DIST_TYPE]) not in self.supported:
algo-1-tfcvc_1  | KeyError: 'S3DistributionType'

本地PC运行时错误

RuntimeError: Failed to run: ['docker-compose','-f','/tmp/tmp71tx0fop/docker-compose.yaml','up','--build','--abort-on-container-exit'],Process exited with code: 1

如果Jupyter笔记本使用Amazon Cloud SageMaker环境(而不是在本地PC上)运行,则没有错误。请注意,在云笔记本上运行时,会话初始化为:

session = sagemaker.Session()

LocalSession()与托管的Docker容器的工作方式似乎存在问题。

解决方法

在本地Jupyter笔记本中运行SageMaker时,它希望Docker容器也在本地计算机上运行。

确保SageMaker(在本地笔记本中运行)使用AWS托管的Docker容器的关键是在初始化LocalSession时省略Estimator对象。

xgb_estimator = sagemaker.estimator.Estimator(
    container,role,train_instance_count=1,train_instance_type=instance_type,output_path=f's3://{session.default_bucket()}/{prefix}/output',sagemaker_session=session)

正确

xgb_estimator = sagemaker.estimator.Estimator(
    container,output_path=f's3://{session.default_bucket()}/{prefix}/output')

其他信息

SageMaker Python SDK源代码提供以下有用的提示:

文件: sagemaker / local / local_session.py

class LocalSagemakerClient(object):
    """A SageMakerClient that implements the API calls locally.

    Used for doing local training and hosting local endpoints. It still needs access to
    a boto client to interact with S3 but it won't perform any SageMaker call.
    ...

文件: sagemaker / estimator.py

class EstimatorBase(with_metaclass(ABCMeta,object)):
    """Handle end-to-end Amazon SageMaker training and deployment tasks.

    For introduction to model training and deployment,see
    http://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html

    Subclasses must define a way to determine what image to use for training,what hyperparameters to use,and how to create an appropriate predictor instance.
    """

    def __init__(self,train_instance_count,train_instance_type,train_volume_size=30,train_max_run=24 * 60 * 60,input_mode='File',output_path=None,output_kms_key=None,base_job_name=None,sagemaker_session=None,tags=None):
        """Initialize an ``EstimatorBase`` instance.

        Args:
            role (str): An AWS IAM role (either name or full ARN). ...
            
        ...

            sagemaker_session (sagemaker.session.Session): Session object which manages interactions with
                Amazon SageMaker APIs and any other AWS services needed. If not specified,the estimator creates one
                using the default AWS configuration chain.
        """

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-