65.9K
CodeProject 正在变化。 阅读更多。
Home

在 Amazon SageMaker 中训练 Apache MXNet 模型并在 IEI Tank AIoT 开发板上运行

starIconemptyStarIconemptyStarIconemptyStarIconemptyStarIcon

1.00/5 (1投票)

2018年12月21日

CPOL
viewsIcon

4379

我们将在 Amazon SageMaker 中训练 Apache MXNet Gluon 模型,以读取 MNIST 数据集的手写数字,然后在 IEI Tank AIoT 开发工具包上运行对十个随机手写数字的预测。

引言

我们将在 Amazon SageMaker* 中训练 Apache MXNet* Gluon 模型,以读取 MNIST 数据集的手写数字,然后在 IEI Tank* AIoT 开发工具包上运行对十个随机手写数字的预测。有关 IEI Tank* AIoT 开发工具包的更多信息。

Amazon SageMaker 是一个平台,可以使用 Jupyter* Notebook 和 AWS* S3 对象存储轻松部署机器学习模型。 有关 Amazon SageMaker 的更多信息。

Gluon 是一个新的 MXNet 库,它提供了一个简单的 API,用于原型设计、构建和训练深度学习模型。 我们需要 MXNet 估计器才能在 Amazon SageMaker 中运行 MXNet 模型。 有关 MXNet Gluon 模型的更多信息。

必备组件

训练模型

登录您的 AWS 账户 并转到 Amazon SageMaker

创建 notebook 实例

填写 notebook 实例名称,添加 IAM 角色,然后选择 创建 notebook 实例

您将在页面顶部看到确认信息和新的 notebook 实例

通过选择 新建 > conda_mxnet_p27 来创建一个新的 notebook 文件

选择 未命名 并将名称更改为 training。 选择 插入 > 在下方插入单元格 以添加单元格。

将 Jupyter* Notebook training.ipynb 的每个单元格复制到您的 notebook 的单元格中。 在本教程末尾的 示例代码 部分找到 training.ipynb。 转到 notebook 实例并通过选择 上传 来添加 mxnet-mnist.py(在 示例代码 部分找到它)。

选择 上传

返回到 training.ipynb 并通过选择 单元格 > 运行全部 来运行它

获取有关 S3 存储桶和训练作业名称的信息

等待所有单元格完成运行。 您将看到类似于此的输出

运行预测

转到 Amazon S3 > S3 存储桶 > 训练作业 > 输出

通过单击其旁边的复选框并从右侧菜单中选择 下载 来下载 model.tar.gz

在 Tank 上,从下载的存档中提取 model.params 文件。

tar –xzvf model.tar.gz

如果需要,请安装依赖项。

sudo pip2 install mxnet matplotlib

load.py(在 示例代码 中找到它)保存在与 model.params 相同的文件夹中。

运行 load.py

python load.py

您将看到十个随机手写数字的图像以及模型对它们的预测

结论

我们已成功在 Amazon SageMaker 中训练了 MXNet Gluon 模型来读取手写数字,并在 Tank 上获得了验证集的良好预测。

示例代码

training.ipynb

from __future__ import print_function

import os
import boto3
import sagemaker
from sagemaker.mxnet import MXNet
from sagemaker import get_execution_role
import mxnet as mx
import mxnet.ndarray as nd
from mxnet import nd, autograd, gluon
from mxnet.gluon.data.vision import transforms
import logging 

import numpy as np


sagemaker_session = sagemaker.Session()

role = get_execution_role()
gluon.data.vision.MNIST('./data/train', train=True)
inputs = sagemaker_session.upload_data(path='data', bucket='sagemaker-mxnet-gluon')
!cat 'mxnet-mnist.py'
m = MXNet("mxnet-mnist.py",
          role=role,
          train_instance_count=1,
          train_instance_type="ml.c4.xlarge",
          framework_version="1.2.1",
          hyperparameters={'batch_size': 64,
                         'epochs': 1,
                         'learning_rate': 0.001,
                         'log_interval': 100})
m.fit(inputs)

mxnet-mnist.py

m.fit(inputs)
from __future__ import print_function

import mxnet as mx
import mxnet.ndarray as nd
from mxnet import nd, autograd, gluon
from mxnet.gluon.data.vision import transforms

import numpy as np

import logging
import time


# Clasify the images into one of the 10 digits
num_outputs = 10

logging.basicConfig(level=logging.DEBUG)


# Build a simple convolutional network
def build_lenet(net):    
    with net.name_scope():
        # First convolution
        net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
        # Second convolution
        net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
        # Flatten the output before the fully connected layers
        net.add(gluon.nn.Flatten())
        # First fully connected layers with 512 neurons
        net.add(gluon.nn.Dense(512, activation="relu"))
        # Second fully connected layer with as many neurons as the number of classes
        net.add(gluon.nn.Dense(num_outputs))

        return net


# Train a given model using MNIST data
def train(channel_input_dirs, hyperparameters, **kwargs):
    ctx = mx.cpu()
    # retrieve the hyperparameters we set in notebook (with some defaults)
    batch_size = hyperparameters.get('batch_size', 64)
    epochs = hyperparameters.get('epochs', 1)
    learning_rate = hyperparameters.get('learning_rate', 0.001)
    log_interval = hyperparameters.get('log_interval', 100)

    # load training and validation data
    # we use the gluon.data.vision.MNIST class because of its built in mnist pre-processing logic,
    # but point it at the location where SageMaker placed the data files, so it doesn't download them again.
    training_dir = channel_input_dirs['training']
    train_data = get_train_data(training_dir + '/train', batch_size)

    # Define the network
    net = build_lenet(gluon.nn.Sequential())

    # Initialize the parameters with Xavier initializer
    net.collect_params().initialize(mx.init.Xavier(), ctx=ctx)
    # Use cross entropy loss
    softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss()
    # Use Adam optimizer
    trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': .001})

    # Train for one epoch
    for epoch in range(1):
        # Iterate through the images and labels in the training data
        for batch_num, (data, label) in enumerate(train_data):
            # get the images and labels
            data = data.as_in_context(ctx)
            label = label.as_in_context(ctx)
            # Ask autograd to record the forward pass
            with autograd.record():
                # Run the forward pass
                output = net(data)
                # Compute the loss
                loss = softmax_cross_entropy(output, label)
            # Compute gradients
            loss.backward()
            # Update parameters
            trainer.step(data.shape[0])

            # Print loss once in a while
            if batch_num % 50 == 0:
                curr_loss = nd.mean(loss).asscalar()
                print("Epoch: %d; Batch %d; Loss %f" % (epoch, batch_num, curr_loss))

    return net


def save(net, model_dir):
    # Save the model
    net.save_parameters('%s/model.params' % model_dir)


def get_train_data(data_dir, batch_size):
    # Load the training data
    return gluon.data.DataLoader(
        gluon.data.vision.MNIST(data_dir, train=True).transform_first(transforms.ToTensor()),
                                   batch_size, shuffle=True)

load.py

from __future__ import print_function

import mxnet as mx
import mxnet.ndarray as nd
from mxnet import gluon
import matplotlib.pyplot as plt

import numpy as np


# Build a simple convolutional network
def build_lenet(net):
    with net.name_scope():
        # First convolution
        net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
        # Second convolution
        net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu'))
        net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2))
        # Flatten the output before the fully connected layers
        net.add(gluon.nn.Flatten())
        # First fully connected layers with 512 neurons
        net.add(gluon.nn.Dense(512, activation="relu"))
        # Second fully connected layer with as many neurons as the number of classes
        net.add(gluon.nn.Dense(num_outputs))

        return net


def verify_loaded_model(net):
    """Run inference using ten random images.
    Print both input and output of the model"""

    def transform(data, label):
        return data.astype(np.float32)/255, label.astype(np.float32)

    # Load ten random images from the test dataset
    sample_data = mx.gluon.data.DataLoader(
        mx.gluon.data.vision.MNIST(train=False, transform=transform),
        10, shuffle=True)

    for data, label in sample_data:

        # Prepare the images
        img = nd.transpose(data, (1, 0, 2, 3))
        img = nd.reshape(img, (28, 10*28, 1))
        imtiles = nd.tile(img, (1, 1, 3))
        plt.imshow(imtiles.asnumpy())

        # Display the predictions
        data = nd.transpose(data, (0, 3, 1, 2))
        out = net(data.as_in_context(ctx))
        predictions = nd.argmax(out, axis=1)
        print('Model predictions: ', predictions.asnumpy())

        # Display the images
        plt.show()

        break


if __name__ == "__main__":
    # Use GPU if one exists, else use CPU
    ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu()
    # Clasify the images into one of the 10 digits
    num_outputs = 10
    # Name of the model file
    file_name = "model.params"

    new_net = build_lenet(gluon.nn.Sequential())
    new_net.load_parameters(file_name, ctx=ctx)

    verify_loaded_model(new_net)

关于作者

Rosalia Nyurguhun 是 Intel Core and Visual Computing Group 的一名软件工程师,致力于为物联网提供规模化支持的项目。

© . All rights reserved.