在 Amazon SageMaker 中训练 Apache MXNet 模型并在 IEI Tank AIoT 开发板上运行
我们将在 Amazon SageMaker 中训练 Apache MXNet Gluon 模型,以读取 MNIST 数据集的手写数字,然后在 IEI Tank AIoT 开发工具包上运行对十个随机手写数字的预测。
引言
我们将在 Amazon SageMaker* 中训练 Apache MXNet* Gluon 模型,以读取 MNIST 数据集的手写数字,然后在 IEI Tank* AIoT 开发工具包上运行对十个随机手写数字的预测。有关 IEI Tank* AIoT 开发工具包的更多信息。
Amazon SageMaker 是一个平台,可以使用 Jupyter* Notebook 和 AWS* S3 对象存储轻松部署机器学习模型。 有关 Amazon SageMaker 的更多信息。
Gluon 是一个新的 MXNet 库,它提供了一个简单的 API,用于原型设计、构建和训练深度学习模型。 我们需要 MXNet 估计器才能在 Amazon SageMaker 中运行 MXNet 模型。 有关 MXNet Gluon 模型的更多信息。
必备组件
- IEI Tank* AIoT 开发工具包
- Linux* Ubuntu* 16.04 操作系统
- Python* 2.7
- AWS 账户
训练模型
登录您的 AWS 账户 并转到 Amazon SageMaker。
创建 notebook 实例
填写 notebook 实例名称,添加 IAM 角色,然后选择 创建 notebook 实例
您将在页面顶部看到确认信息和新的 notebook 实例
通过选择 新建 > conda_mxnet_p27 来创建一个新的 notebook 文件
选择 未命名 并将名称更改为 training。 选择 插入 > 在下方插入单元格 以添加单元格。
将 Jupyter* Notebook training.ipynb 的每个单元格复制到您的 notebook 的单元格中。 在本教程末尾的 示例代码 部分找到 training.ipynb。 转到 notebook 实例并通过选择 上传 来添加 mxnet-mnist.py(在 示例代码 部分找到它)。
选择 上传
返回到 training.ipynb 并通过选择 单元格 > 运行全部 来运行它
获取有关 S3 存储桶和训练作业名称的信息
等待所有单元格完成运行。 您将看到类似于此的输出
运行预测
转到 Amazon S3 > S3 存储桶 > 训练作业 > 输出
通过单击其旁边的复选框并从右侧菜单中选择 下载 来下载 model.tar.gz。
在 Tank 上,从下载的存档中提取 model.params 文件。
tar –xzvf model.tar.gz
如果需要,请安装依赖项。
sudo pip2 install mxnet matplotlib
将 load.py(在 示例代码 中找到它)保存在与 model.params 相同的文件夹中。
运行 load.py
python load.py
您将看到十个随机手写数字的图像以及模型对它们的预测
结论
我们已成功在 Amazon SageMaker 中训练了 MXNet Gluon 模型来读取手写数字,并在 Tank 上获得了验证集的良好预测。
示例代码
training.ipynb
from __future__ import print_function import os import boto3 import sagemaker from sagemaker.mxnet import MXNet from sagemaker import get_execution_role import mxnet as mx import mxnet.ndarray as nd from mxnet import nd, autograd, gluon from mxnet.gluon.data.vision import transforms import logging import numpy as np sagemaker_session = sagemaker.Session() role = get_execution_role()
gluon.data.vision.MNIST('./data/train', train=True) inputs = sagemaker_session.upload_data(path='data', bucket='sagemaker-mxnet-gluon')
!cat 'mxnet-mnist.py'
m = MXNet("mxnet-mnist.py", role=role, train_instance_count=1, train_instance_type="ml.c4.xlarge", framework_version="1.2.1", hyperparameters={'batch_size': 64, 'epochs': 1, 'learning_rate': 0.001, 'log_interval': 100})
m.fit(inputs)
mxnet-mnist.py
m.fit(inputs) from __future__ import print_function import mxnet as mx import mxnet.ndarray as nd from mxnet import nd, autograd, gluon from mxnet.gluon.data.vision import transforms import numpy as np import logging import time # Clasify the images into one of the 10 digits num_outputs = 10 logging.basicConfig(level=logging.DEBUG) # Build a simple convolutional network def build_lenet(net): with net.name_scope(): # First convolution net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu')) net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2)) # Second convolution net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu')) net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2)) # Flatten the output before the fully connected layers net.add(gluon.nn.Flatten()) # First fully connected layers with 512 neurons net.add(gluon.nn.Dense(512, activation="relu")) # Second fully connected layer with as many neurons as the number of classes net.add(gluon.nn.Dense(num_outputs)) return net # Train a given model using MNIST data def train(channel_input_dirs, hyperparameters, **kwargs): ctx = mx.cpu() # retrieve the hyperparameters we set in notebook (with some defaults) batch_size = hyperparameters.get('batch_size', 64) epochs = hyperparameters.get('epochs', 1) learning_rate = hyperparameters.get('learning_rate', 0.001) log_interval = hyperparameters.get('log_interval', 100) # load training and validation data # we use the gluon.data.vision.MNIST class because of its built in mnist pre-processing logic, # but point it at the location where SageMaker placed the data files, so it doesn't download them again. training_dir = channel_input_dirs['training'] train_data = get_train_data(training_dir + '/train', batch_size) # Define the network net = build_lenet(gluon.nn.Sequential()) # Initialize the parameters with Xavier initializer net.collect_params().initialize(mx.init.Xavier(), ctx=ctx) # Use cross entropy loss softmax_cross_entropy = gluon.loss.SoftmaxCrossEntropyLoss() # Use Adam optimizer trainer = gluon.Trainer(net.collect_params(), 'adam', {'learning_rate': .001}) # Train for one epoch for epoch in range(1): # Iterate through the images and labels in the training data for batch_num, (data, label) in enumerate(train_data): # get the images and labels data = data.as_in_context(ctx) label = label.as_in_context(ctx) # Ask autograd to record the forward pass with autograd.record(): # Run the forward pass output = net(data) # Compute the loss loss = softmax_cross_entropy(output, label) # Compute gradients loss.backward() # Update parameters trainer.step(data.shape[0]) # Print loss once in a while if batch_num % 50 == 0: curr_loss = nd.mean(loss).asscalar() print("Epoch: %d; Batch %d; Loss %f" % (epoch, batch_num, curr_loss)) return net def save(net, model_dir): # Save the model net.save_parameters('%s/model.params' % model_dir) def get_train_data(data_dir, batch_size): # Load the training data return gluon.data.DataLoader( gluon.data.vision.MNIST(data_dir, train=True).transform_first(transforms.ToTensor()), batch_size, shuffle=True)
load.py
from __future__ import print_function import mxnet as mx import mxnet.ndarray as nd from mxnet import gluon import matplotlib.pyplot as plt import numpy as np # Build a simple convolutional network def build_lenet(net): with net.name_scope(): # First convolution net.add(gluon.nn.Conv2D(channels=20, kernel_size=5, activation='relu')) net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2)) # Second convolution net.add(gluon.nn.Conv2D(channels=50, kernel_size=5, activation='relu')) net.add(gluon.nn.MaxPool2D(pool_size=2, strides=2)) # Flatten the output before the fully connected layers net.add(gluon.nn.Flatten()) # First fully connected layers with 512 neurons net.add(gluon.nn.Dense(512, activation="relu")) # Second fully connected layer with as many neurons as the number of classes net.add(gluon.nn.Dense(num_outputs)) return net def verify_loaded_model(net): """Run inference using ten random images. Print both input and output of the model""" def transform(data, label): return data.astype(np.float32)/255, label.astype(np.float32) # Load ten random images from the test dataset sample_data = mx.gluon.data.DataLoader( mx.gluon.data.vision.MNIST(train=False, transform=transform), 10, shuffle=True) for data, label in sample_data: # Prepare the images img = nd.transpose(data, (1, 0, 2, 3)) img = nd.reshape(img, (28, 10*28, 1)) imtiles = nd.tile(img, (1, 1, 3)) plt.imshow(imtiles.asnumpy()) # Display the predictions data = nd.transpose(data, (0, 3, 1, 2)) out = net(data.as_in_context(ctx)) predictions = nd.argmax(out, axis=1) print('Model predictions: ', predictions.asnumpy()) # Display the images plt.show() break if __name__ == "__main__": # Use GPU if one exists, else use CPU ctx = mx.gpu() if mx.test_utils.list_gpus() else mx.cpu() # Clasify the images into one of the 10 digits num_outputs = 10 # Name of the model file file_name = "model.params" new_net = build_lenet(gluon.nn.Sequential()) new_net.load_parameters(file_name, ctx=ctx) verify_loaded_model(new_net)
关于作者
Rosalia Nyurguhun 是 Intel Core and Visual Computing Group 的一名软件工程师,致力于为物联网提供规模化支持的项目。