在 Amazon SageMaker 中训练 Apache MXNet 模型并在 IEI Tank AIoT 开发板上运行

Intel

1.00/5 (1投票)

2018年12月21日

CPOL

4379

我们将在 Amazon SageMaker 中训练 Apache MXNet Gluon 模型，以读取 MNIST 数据集的手写数字，然后在 IEI Tank AIoT 开发工具包上运行对十个随机手写数字的预测。

引言

我们将在 Amazon SageMaker* 中训练 Apache MXNet* Gluon 模型，以读取 MNIST 数据集的手写数字，然后在 IEI Tank* AIoT 开发工具包上运行对十个随机手写数字的预测。有关 IEI Tank* AIoT 开发工具包的更多信息。

Amazon SageMaker 是一个平台，可以使用 Jupyter* Notebook 和 AWS* S3 对象存储轻松部署机器学习模型。有关 Amazon SageMaker 的更多信息。

Gluon 是一个新的 MXNet 库，它提供了一个简单的 API，用于原型设计、构建和训练深度学习模型。我们需要 MXNet 估计器才能在 Amazon SageMaker 中运行 MXNet 模型。有关 MXNet Gluon 模型的更多信息。

必备组件

IEI Tank* AIoT 开发工具包
Linux* Ubuntu* 16.04 操作系统
Python* 2.7
AWS 账户

训练模型

登录您的 AWS 账户并转到 Amazon SageMaker。

创建 notebook 实例

填写 notebook 实例名称，添加 IAM 角色，然后选择 创建 notebook 实例

您将在页面顶部看到确认信息和新的 notebook 实例

通过选择 新建 > conda_mxnet_p27 来创建一个新的 notebook 文件

选择 未命名 并将名称更改为 training。选择 插入 > 在下方插入单元格 以添加单元格。

将 Jupyter* Notebook training.ipynb 的每个单元格复制到您的 notebook 的单元格中。在本教程末尾的 示例代码 部分找到 training.ipynb。转到 notebook 实例并通过选择上传来添加 mxnet-mnist.py（在 示例代码 部分找到它）。

选择上传

返回到 training.ipynb 并通过选择 单元格 > 运行全部 来运行它

获取有关 S3 存储桶和训练作业名称的信息

等待所有单元格完成运行。您将看到类似于此的输出

运行预测

转到 Amazon S3 > S3 存储桶 > 训练作业 > 输出

通过单击其旁边的复选框并从右侧菜单中选择下载来下载 model.tar.gz。

在 Tank 上，从下载的存档中提取 model.params 文件。

tar –xzvf model.tar.gz

如果需要，请安装依赖项。

sudo pip2 install mxnet matplotlib

将 load.py（在 示例代码 中找到它）保存在与 model.params 相同的文件夹中。

运行 load.py

python load.py

您将看到十个随机手写数字的图像以及模型对它们的预测

结论

我们已成功在 Amazon SageMaker 中训练了 MXNet Gluon 模型来读取手写数字，并在 Tank 上获得了验证集的良好预测。

示例代码

training.ipynb

from __future__ import print_function

import os
import boto3
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker import get_execution_role
import mxnet as mx
import mxnet.ndarray as nd
from mxnet import nd, autograd, gluon
from mxnet.gluon.data.vision import transforms
import logging 

import numpy as np


sagemaker_session = sagemaker.Session()

role = get_execution_role()

gluon.data.vision.MNIST('./data/train', train=True)
inputs = sagemaker_session.upload_data(path='data', bucket='sagemaker-mxnet-gluon')

!cat 'mxnet-mnist.py'

m = MXNet("mxnet-mnist.py",
          role=role,
          train_instance_count=1,
          train_instance_type="ml.c4.xlarge",
          framework_version="1.2.1",
          hyperparameters={'batch_size': 64,
                         'epochs': 1,
                         'learning_rate': 0.001,
                         'log_interval': 100})

m.fit(inputs)

mxnet-mnist.py

m.fit(inputs)
from __future__ import print_function

import mxnet as mx
import mxnet.ndarray as nd
from mxnet import nd, autograd, gluon
from mxnet.gluon.data.vision import transforms

import numpy as np

import logging
import time


# Clasify the images into one of the 10 digits
num_outputs = 10

logging.basicConfig(level=logging.DEBUG)


# Build a simple convolutional network
def build_lenet(net):    
    with net.name_scope():
        # First convolution
        net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
        # Second convolution
        net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
        # Flatten the output before the fully connected layers
        net.add(gluon.nn.Flatten())
        # First fully connected layers with 512 neurons
        net.add(gluon.nn.Dense(512, activation="relu"))
        # Second fully connected layer with as many neurons as the number of classes
        net.add(gluon.nn.Dense(num_outputs))

        return net


# Train a given model using MNIST data
def train(channel_input_dirs, hyperparameters, **kwargs):
    ctx = mx.cpu()
    # retrieve the hyperparameters we set in notebook (with some defaults)
    batch_size = hyperparameters.get('batch_size', 64)
    epochs = hyperparameters.get('epochs', 1)
    learning_rate = hyperparameters.get('learning_rate', 0.001)
    log_interval = hyperparameters.get('log_interval', 100)

    # load training and validation data
    # we use the gluon.data.vision.MNIST class because of its built in mnist pre-processing logic,
    # but point it at the location where SageMaker placed the data files, so it doesn't download them again.
    training_dir = channel_input_dirs['training']
    train_data = get_train_data(training_dir + '/train', batch_size)

    # Define the network
    net = build_lenet(gluon.nn.Sequential())

    # Initialize the parameters with Xavier initializer
    net.collect_params().initialize(mx.init.Xavier(), ctx=ctx)
    # Use cross entropy loss
    softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
    # Use Adam optimizer
    trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': .001})

    # Train for one epoch
    for epoch in range(1):
        # Iterate through the images and labels in the training data
        for batch_num, (data, label) in enumerate(train_data):
            # get the images and labels
            data = data.as_in_context(ctx)
            label = label.as_in_context(ctx)
            # Ask autograd to record the forward pass
            with autograd.record():
                # Run the forward pass
                output = net(data)
                # Compute the loss
                loss = softmax_cross_entropy(output, label)
            # Compute gradients
            loss.backward()
            # Update parameters
            trainer.step(data.shape[0])

            # Print loss once in a while
            if batch_num % 50 == 0:
                curr_loss = nd.mean(loss).asscalar()
                print("Epoch: %d; Batch %d; Loss %f" % (epoch, batch_num, curr_loss))

    return net


def save(net, model_dir):
    # Save the model
    net.save_parameters('%s/model.params' % model_dir)


def get_train_data(data_dir, batch_size):
    # Load the training data
    return gluon.data.DataLoader(
        gluon.data.vision.MNIST(data_dir, train=True).transform_first(transforms.ToTensor()),
                                   batch_size, shuffle=True)

load.py

from __future__ import print_function

import mxnet as mx
import mxnet.ndarray as nd
from mxnet import gluon
import matplotlib.pyplot as plt

import numpy as np


# Build a simple convolutional network
def build_lenet(net):
    with net.name_scope():
        # First convolution
        net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
        # Second convolution
        net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
        # Flatten the output before the fully connected layers
        net.add(gluon.nn.Flatten())
        # First fully connected layers with 512 neurons
        net.add(gluon.nn.Dense(512, activation="relu"))
        # Second fully connected layer with as many neurons as the number of classes
        net.add(gluon.nn.Dense(num_outputs))

        return net


def verify_loaded_model(net):
    """Run inference using ten random images.
    Print both input and output of the model"""

    def transform(data, label):
        return data.astype(np.float32)/255, label.astype(np.float32)

    # Load ten random images from the test dataset
    sample_data = mx.gluon.data.DataLoader(
        mx.gluon.data.vision.MNIST(train=False, transform=transform),
        10, shuffle=True)

    for data, label in sample_data:

        # Prepare the images
        img = nd.transpose(data, (1, 0, 2, 3))
        img = nd.reshape(img, (28, 10*28, 1))
        imtiles = nd.tile(img, (1, 1, 3))
        plt.imshow(imtiles.asnumpy())

        # Display the predictions
        data = nd.transpose(data, (0, 3, 1, 2))
        out = net(data.as_in_context(ctx))
        predictions = nd.argmax(out, axis=1)
        print('Model predictions: ', predictions.asnumpy())

        # Display the images
        plt.show()

        break


if __name__ == "__main__":
    # Use GPU if one exists, else use CPU
    ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()
    # Clasify the images into one of the 10 digits
    num_outputs = 10
    # Name of the model file
    file_name = "model.params"

    new_net = build_lenet(gluon.nn.Sequential())
    new_net.load_parameters(file_name, ctx=ctx)

    verify_loaded_model(new_net)

关于作者

Rosalia Nyurguhun 是 Intel Core and Visual Computing Group 的一名软件工程师，致力于为物联网提供规模化支持的项目。