部署在gpu上运行以进行推理的自定义，训练有素的pytorch Bert模型

如何解决部署在gpu上运行以进行推理的自定义，训练有素的pytorch Bert模型

感谢您抽出宝贵的时间阅读问题。

我想寻求一些建议，以部署定制的，经过训练的pytorch Bert模型，该模型在gpu上运行以进行推理（无需进行训练，模型已保存为.pt文件）。
我在AWS上搜索了不同的文档，并发现如下链接：
https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/scikit_bring_your_own
https://github.com/aws/amazon-sagemaker-examples/tree/master/advanced_functionality/pytorch_extending_our_containers
https://github.com/aws-samples/amazon-sagemaker-bert-pytorch

首先，我不知道是否要创建每日自动批处理推断的容器。因为我将最后一个链接放在了那里，所以他们甚至都没有创建容器。

如果是–我已尝试按照教程创建具有以下结构的容器文件：

容器

Dockerfile
build_and_push.sh
分类

predictor.py
服务
wsgi.py
nginx.conf
其他支持predictor.py的python脚本
模型（包含保存.pt文件的文件夹）

但是，目前我对几件事感到困惑：

在线上有很多Dockerfile示例，一些使用python，一些使用ubuntu，还有一些使用来自AWS账户的pytorch_training。我从pytorch-gpu的拥抱面中选择了一个：NVIDIA / CUDA：10.1-cudnn7-runtime-ubuntu18.04 问题：我用作基本图像有关系吗？我是否还需要编写如下内容： ENV SAGEMAKER_SUBMIT_DIRECTORY / opt / program ENV SAGEMAKER_PROGRAM评论分类/服务（假设这些仅在使用pytorch_training图片时有效？
我使用build_and_push.sh文件在我的ECR上创建了一个图像，但是如何知道它是否正确设置？
服务代码重要吗？现在，我从第一个链接之一获得了服务代码。它说您通常不需要修改服务中的任何内容。但是，我可以说这是CPU的设置参数。我是否需要针对GPU进行修改。如果可以，怎么办？
下一步应该做什么？

我当前的Dockerfile看起来像这样：

# https://hub.docker.com/r/huggingface/transformers-pytorch-gpu/dockerfile
FROM nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04
# FROM 763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.5.0-gpu-py36-cu101-ubuntu16.04
# FROM 785573368785.dkr.ecr.us-east-1.amazonaws.com/sagemaker-inference-pytorch:1.5.0f-gpu-py3

LABEL maintainer="bqge@amazon.com"
LABEL project="product-review-models"

RUN apt update && \
    apt install -y bash \
                   build-essential \
                   git \
                   curl \
                   wget \
                   nginx \
                   ca-certificates \
                   python3 \
                   python3-pip && \
    rm -rf /var/lib/apt/lists

# Here we get all python packages.
RUN python3 -m pip install --no-cache-dir --upgrade pip && \
    python3 -m pip install --no-cache-dir \
        mkl \
        torch==1.5.0 \
        transformers==2.11.0 \
        path \
        sklearn \
        xlrd \
        spacy==2.1.0 \
        flask \
        gevent \
        gunicorn \
        pandas \
        ipython \
        spacy==2.1.0 \
        neuralcoref==4.0 && \
    python3 -m spacy download en_core_web_md

# RUN rm -f /usr/bin/python && ln -s /usr/bin/python /usr/bin/python3
# Set some environment variables. PYTHONUNBUFFERED keeps Python from buffering our standard
# output stream,which means that logs can be delivered to the user quickly. PYTHONDONTWRITEBYTECODE
# keeps Python from writing the .pyc files which are unnecessary in this case. We also update
# PATH so that the train and serve programs are found when the container is invoked.

ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/program:${PATH}"

# Set up the program in the image
# /opt/ml and all subdirectories are utilized by SageMaker,we use the /code subdirectory to store our user code.
COPY /review-classification /opt/program
WORKDIR /opt/program

# this environment variable is used by the SageMaker PyTorch container to determine our user code directory.
ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/program

# this environment variable is used by the SageMaker PyTorch container to determine our program entry point
# for training and serving.
ENV SAGEMAKER_PROGRAM review-classification/serve

ENTRYPOINT ["/usr/bin/python3","/opt/program/serve"]

我当前的build_and_push.sh看起来像这样

#!/usr/bin/env bash

# This script shows how to build the Docker image and push it to ECR to be ready for use
# by SageMaker.

# The name of our algorithm
algorithm_name=product-review-repo

# parameters
PY_VERSION="py36"


account=$(aws sts get-caller-identity --query Account --output text)

if [ $? -ne 0 ]
then
    exit 255
fi

cd SageMaker/container

chmod +x review-classification/serve

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-east-1}

TAG="gpu-${PY_VERSION}"

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:${TAG}"

# If the repository doesn't exist in ECR,create it.
aws ecr describe-repositories --repository-names ${algorithm_name} || aws ecr create-repository --repository-name ${algorithm_name}

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

echo "---> repository done.."
# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

echo "---> logged in to account ecr.."

echo "Building image with arch=gpu,region=${region}"


# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build  -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

以下是我的服务代码：

#!/usr/bin/env python

# This file implements the scoring service shell. You don't necessarily need to modify it for various
# algorithms. It starts nginx and gunicorn with the correct configurations and then simply waits until
# gunicorn exits.
#
# The flask server is specified to be the app object in wsgi.py
#
# We set the following parameters:
#
# Parameter                Environment Variable              Default Value
# ---------                --------------------              -------------
# number of workers        MODEL_SERVER_WORKERS              the number of CPU cores
# timeout                  MODEL_SERVER_TIMEOUT              60 seconds

from __future__ import print_function
import multiprocessing
import os
import signal
import subprocess
import sys

cpu_count = multiprocessing.cpu_count()

model_server_timeout = os.environ.get('MODEL_SERVER_TIMEOUT',60)
model_server_workers = int(os.environ.get('MODEL_SERVER_WORKERS',cpu_count))

def sigterm_handler(nginx_pid,gunicorn_pid):
    try:
        os.kill(nginx_pid,signal.SIGQUIT)
    except OSError:
        pass
    try:
        os.kill(gunicorn_pid,signal.SIGTERM)
    except OSError:
        pass

    sys.exit(0)

def start_server():
    print('Starting the inference server with {} workers.'.format(model_server_workers))


    # link the log streams to stdout/err so they will be logged to the container logs
    subprocess.check_call(['ln','-sf','/dev/stdout','/var/log/nginx/access.log'])
    subprocess.check_call(['ln','/dev/stderr','/var/log/nginx/error.log'])

    nginx = subprocess.Popen(['nginx','-c','/opt/program/nginx.conf'])
    gunicorn = subprocess.Popen(['gunicorn','--timeout',str(model_server_timeout),'-k','gevent','-b','unix:/tmp/gunicorn.sock','-w',str(model_server_workers),'wsgi:app'])

    signal.signal(signal.SIGTERM,lambda a,b: sigterm_handler(nginx.pid,gunicorn.pid))

    # If either subprocess exits,so do we.
    pids = set([nginx.pid,gunicorn.pid])
    while True:
        pid,_ = os.wait()
        if pid in pids:
            break

    sigterm_handler(nginx.pid,gunicorn.pid)
    print('Inference server exiting')

# The main routine just invokes the start function.

if __name__ == '__main__':
    start_server()

非常感谢您！

部署在gpu上运行以进行推理的自定义，训练有素的pytorch Bert模型

如何解决部署在gpu上运行以进行推理的自定义，训练有素的pytorch Bert模型

相关推荐