使用 Python+Keras 进行基础深度学习

Jesús Utrera

5.00/5 (9投票s)

2018年5月20日

CPOL

6分钟阅读

24062

这是介绍 Python 和 Keras 框架中深度学习编码的文章系列的第一篇文章。

引言

监督式深度学习广泛用于机器学习，例如计算机视觉系统。在本文中，我们将介绍使用 Keras 框架进行监督式深度学习的一些关键注意事项。

Keras 是一个高级机器学习框架，我们可以用 Python 编写代码，并且可以在最知名的机器学习框架（如 TensorFlow、CNTK 或 Theano）上运行。它旨在使实验过程变得轻松快捷。

背景

本文不介绍深度学习入门。您应该了解深度学习的基础知识以及一些 Python 编码知识。本文的主要目的是向您介绍 Keras 框架的基础知识，并与其他知名库一起使用，进行快速实验并得出初步结论。

Using the Code

在本第一篇文章中，我们将训练一个简单的神经网络，而在接下来的文章中，我们将介绍一些已知的深度学习架构并进行一些比较。

所有实验均出于教育目的，训练过程将非常快速，结果也不会完美。

第一步：加载库

首先，我们将加载所需的库：numpy、TensorFlow（在本实验中，我们将使用此框架运行 Keras）、Keras、Scikit Learn、Pandas……以及更多。

import numpy as np 
from scipy import misc 
from PIL import Image 
import glob 
import matplotlib.pyplot as plt 
import scipy.misc 
from matplotlib.pyplot import imshow 
%matplotlib inline 
from IPython.display import SVG 
import cv2 
import seaborn as sn 
import pandas as pd 
import pickle 
from keras import layers 
from keras.layers import Flatten, Input, Add, Dense, Activation, 
                  ZeroPadding2D, BatchNormalization, Flatten, 
                  Conv2D, AveragePooling2D, 
                  MaxPooling2D, GlobalMaxPooling2D, Dropout 
from keras.models import Sequential, Model, load_model 
from keras.preprocessing import image 
from keras.preprocessing.image import load_img 
from keras.preprocessing.image import img_to_array 
from keras.applications.imagenet_utils import decode_predictions 
from keras.utils import layer_utils, np_utils 
from keras.utils.data_utils import get_file 
from keras.applications.imagenet_utils import preprocess_input 
from keras.utils.vis_utils import model_to_dot 
from keras.utils import plot_model 
from keras.initializers import glorot_uniform 
from keras import losses 
import keras.backend as K 
from keras.callbacks import ModelCheckpoint 
from sklearn.metrics import confusion_matrix, classification_report 
import tensorflow as tf

设置数据集

在本练习中，我们将使用 CIFAR-100 数据集。该数据集已被使用很长时间。它每个类别有 600 张图像，总共 100 个类别。每个类别有 500 张用于训练的图像和 100 张用于验证的图像。100 个类别中的每一个都分为 20 个超类别。每张图像都有一个“精细”标签（主要类别）和一个“粗略”标签（其超类别）。

Keras 框架提供了直接下载的模块

from keras.datasets import cifar100 

(x_train_original, y_train_original), 
(x_test_original, y_test_original) = cifar100.load_data(label_mode='fine')

实际上，我们已经下载了训练集和测试集。x_train_original 和 x_test_original 分别包含训练图像和测试图像，而 y_train_original 和 y_test_original 包含标签。

让我们看看 y_train_original

array([[19], [29], [ 0], ..., [ 3], [ 7], [73]])

如您所见，它是一个数组，其中每个数字对应一个标签。因此，我们要做的第一件事是将这些数组转换为独热编码版本（参见 Wikipedia）。

y_train = np_utils.to_categorical(y_train_original, 100)

y_test = np_utils.to_categorical(y_test_original, 100)

好的，现在让我们看看训练集（x_train_original）

array([[[255, 255, 255], 
[255, 255, 255], 
[255, 255, 255], 
..., 
[195, 205, 193], 
[212, 224, 204], 
[182, 194, 167]], 

[[255, 255, 255], 
[254, 254, 254], 
[254, 254, 254], 
..., 
[170, 176, 150], 
[161, 168, 130], 
[146, 154, 113]], 

[[255, 255, 255], 
[254, 254, 254], 
[255, 255, 255], 
..., 
[189, 199, 169], 
[166, 178, 130], 
[121, 133, 87]], 

..., 

[[148, 185, 79], 
[142, 182, 57], 
[140, 179, 60], 
..., 
[ 30, 17, 1], 
[ 65, 62, 15], 
[ 76, 77, 20]], 

[[122, 157, 66], 
[120, 155, 58], 
[126, 160, 71], 
..., 
[ 22, 16, 3], 
[ 97, 112, 56], 
[141, 161, 87]], 

...and more...

], dtype=uint8)

该数据集代表 256 个 RGB 像素的 3 个通道。想看看吗？

imgplot = plt.imshow(x_train_original[3])

plt.show()

接下来，我们需要对图像进行归一化。也就是说，将数据集的每个元素除以总像素数：255。完成后，数组的值将在 0 和 1 之间。

x_train = x_train_original/255

x_test = x_test_original/255

设置训练环境

在训练之前，我们必须在 Keras 环境中设置两个参数。首先，我们必须告诉 Keras 通道在数组中的哪个位置。在图像数组中，通道可以位于最后一个索引或第一个索引。这被称为通道优先或通道后。在本练习中，我们将设置为通道后。

K.set_image_data_format('channels_last')

第二件事是告诉 Keras 当前是哪个阶段。在我们的例子中，是学习阶段。

K.set_learning_phase(1)

训练一个简单的神经网络

我们将训练一个简单的神经网络，因此我们必须编写一个方法来返回一个简单的神经网络模型。

def create_simple_nn():
  model = Sequential()
  model.add(Flatten(input_shape=(32, 32, 3), name="Input_layer")) 
  model.add(Dense(1000, activation='relu', name="Hidden_layer_1")) 
  model.add(Dense(500, activation='relu', name="Hidden_layer_2")) 
  model.add(Dense(100, activation='softmax', name="Output_layer")) 

  return model

代码中的一些关键点。Flatten 指令将输入（图像矩阵）转换为一维数组。接下来，Dense 指令向模型添加一个隐藏层。第一个隐藏层将有 1000 个节点，第二个隐藏层有 500 个节点，第三个（输出层）有 100 个节点。在隐藏层中，我们将使用 ReLu 激活函数，对于输出层，我们将使用 SoftMax 函数。

模型定义完成后，我们对其进行编译，指定优化函数、损失函数以及我们想要使用的度量。在本系列的所有文章中，我们将使用完全相同的函数。我们将使用随机梯度下降优化函数、分类交叉熵损失函数以及准确率和mse（均方误差）度量。所有这些都已在 Keras 中预先编码。

snn_model = create_simple_nn() 
snn_model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['acc', 'mse'])

完成后，让我们看看模型摘要。

snn_model.summary()

_________________________________________________________________ 
Layer (type) Output Shape Param # 
================================================================= 
Input_layer (Flatten) (None, 3072) 0 
_________________________________________________________________ 
Hidden_layer_1 (Dense) (None, 1000) 3073000 
_________________________________________________________________ 
Hidden_layer_2 (Dense) (None, 500) 500500 
_________________________________________________________________ 
Output_layer (Dense) (None, 100) 50100 
================================================================= 
Total params: 3,623,600 
Trainable params: 3,623,600 
Non-trainable params: 0 
_________________________________________________________________

正如我们所见，尽管这是一个简单的神经网络模型，但它需要训练超过 300 万个参数。这将是深度学习存在的主要原因，因为如果您想训练非常复杂的网络，就需要以这种方式训练大量的参数。

现在，我们只需训练。执行以下操作

snn = snn_model.fit(x=x_train, y=y_train, batch_size=32, 
      epochs=10, verbose=1, validation_data=(x_test, y_test), shuffle=True)

我们告诉 Keras 我们想使用训练好的归一化图像数据集和独热编码的训练标签数组进行训练。我们将使用 32 个块的批次（以减少内存使用）并进行 10 个 epoch。对于验证，我们将使用 x_test 和 y_test。训练结果将分配给 snn 变量。 从中，我们将提取训练历史以进行模型比较。

Train on 50000 samples, validate on 10000 samples 
Epoch 1/10 
50000/50000 [==============================] - 16s 318us/step - loss: 4.1750 - 
acc: 0.0740 - mean_squared_error: 0.0097 - val_loss: 3.9633 - val_acc: 0.1051 - 
val_mean_squared_error: 0.0096 
Epoch 2/10 
50000/50000 [==============================] - 15s 301us/step - loss: 3.7919 - 
acc: 0.1298 - mean_squared_error: 0.0095 - val_loss: 3.7409 - val_acc: 0.1427 - 
val_mean_squared_error: 0.0094 
Epoch 3/10 
50000/50000 [==============================] - 15s 294us/step - loss: 3.6357 - 
acc: 0.1579 - mean_squared_error: 0.0093 - val_loss: 3.6429 - val_acc: 0.1525 - 
val_mean_squared_error: 0.0093 
Epoch 4/10 
50000/50000 [==============================] - 15s 301us/step - loss: 3.5300 - 
acc: 0.1758 - mean_squared_error: 0.0092 - val_loss: 3.6055 - val_acc: 0.1626 - 
val_mean_squared_error: 0.0093 
Epoch 5/10 
50000/50000 [==============================] - 15s 300us/step - loss: 3.4461 - 
acc: 0.1904 - mean_squared_error: 0.0091 - val_loss: 3.5030 - val_acc: 0.1812 - 
val_mean_squared_error: 0.0092 
Epoch 6/10 
50000/50000 [==============================] - 15s 301us/step - loss: 3.3714 - 
acc: 0.2039 - mean_squared_error: 0.0090 - val_loss: 3.4600 - val_acc: 0.1912 - 
val_mean_squared_error: 0.0091 
Epoch 7/10 
50000/50000 [==============================] - 15s 301us/step - loss: 3.3050 - 
acc: 0.2153 - mean_squared_error: 0.0089 - val_loss: 3.4329 - val_acc: 0.1938 - 
val_mean_squared_error: 0.0091 
Epoch 8/10 
50000/50000 [==============================] - 15s 300us/step - loss: 3.2464 - 
acc: 0.2275 - mean_squared_error: 0.0089 - val_loss: 3.3965 - val_acc: 0.2013 - 
val_mean_squared_error: 0.0090 
Epoch 9/10 
50000/50000 [==============================] - 15s 301us/step - loss: 3.1902 - 
acc: 0.2361 - mean_squared_error: 0.0088 - val_loss: 3.3371 - val_acc: 0.2133 - 
val_mean_squared_error: 0.0089 
Epoch 10/10 
50000/50000 [==============================] - 15s 299us/step - loss: 3.1354 - 
acc: 0.2484 - mean_squared_error: 0.0087 - val_loss: 3.3233 - val_acc: 0.2154 - 
val_mean_squared_error: 0.0089

尽管我们一直在训练过程中评估训练情况，但我们应该使用新的测试数据集。我将展示如何在 Keras 中实现这一点。

evaluation = snn_model.evaluate(x=x_test, y=y_test, batch_size=32, verbose=1) 
evaluation 

10000/10000 [==============================] - 1s 127us/step 
[3.323309226989746, 0.2154, 0.008915210169553756]

让我们以图形方式查看结果度量（我们将使用 matplotlib 库）。

plt.figure(0) 
plt.plot(snn.history['acc'],'r') 
plt.plot(snn.history['val_acc'],'g') 
plt.xticks(np.arange(0, 11, 2.0)) 
plt.rcParams['figure.figsize'] = (8, 6) 
plt.xlabel("Num of Epochs") 
plt.ylabel("Accuracy") 
plt.title("Training Accuracy vs Validation Accuracy") 
plt.legend(['train','validation']) 

plt.figure(1) 
plt.plot(snn.history['loss'],'r') 
plt.plot(snn.history['val_loss'],'g') 
plt.xticks(np.arange(0, 11, 2.0)) 
plt.rcParams['figure.figsize'] = (8, 6) 
plt.xlabel("Num of Epochs") 
plt.ylabel("Loss") 
plt.title("Training Loss vs Validation Loss") 
plt.legend(['train','validation']) 

plt.show()

首先，模型泛化性不佳。如果您看到，准确率差异为 4%。

使用 SciKit Learn 的混淆矩阵

模型训练完成后，我们希望在对我们创建的模型可用性得出任何结论之前查看其他度量。为此，我们将创建混淆矩阵，并从中查看精确率、召回率和F1 分数度量（参见 wikipedia）。

要创建混淆矩阵，我们需要对测试集进行预测，然后才能创建混淆矩阵并显示这些度量。预测数组中的每个最大值都将是实际预测。实际上，通常的方法是采用一个偏差值来区分预测值是否可以为正。

snn_pred = snn_model.predict(x_test, batch_size=32, verbose=1) 
snn_predicted = np.argmax(snn_pred, axis=1)

Scikit Learn 库提供了制作混淆矩阵的方法。

#Creamos la matriz de confusión
snn_cm = confusion_matrix(np.argmax(y_test, axis=1), snn_predicted) 

# Visualiamos la matriz de confusión 
snn_df_cm = pd.DataFrame(snn_cm, range(100), range(100)) 
plt.figure(figsize = (20,14)) 
sn.set(font_scale=1.4) #for label size 
sn.heatmap(snn_df_cm, annot=True, annot_kws={"size": 12}) # font size 
plt.show()

最后，显示度量

snn_report = classification_report(np.argmax(y_test, axis=1), snn_predicted)
print(snn_report)

             precision    recall  f1-score   support

          0       0.47      0.32      0.38       100
          1       0.29      0.34      0.31       100
          2       0.24      0.12      0.16       100
          3       0.14      0.10      0.12       100
          4       0.06      0.02      0.03       100
          5       0.14      0.17      0.16       100
          6       0.19      0.13      0.15       100
          7       0.14      0.26      0.19       100
          8       0.22      0.18      0.20       100
          9       0.23      0.39      0.29       100
         10       0.29      0.02      0.04       100
         11       0.27      0.09      0.14       100
         12       0.34      0.23      0.28       100
         13       0.26      0.16      0.20       100
         14       0.19      0.13      0.15       100
         15       0.16      0.14      0.15       100
         16       0.28      0.19      0.23       100
         17       0.32      0.25      0.28       100
         18       0.18      0.26      0.21       100
         19       0.42      0.08      0.13       100
         20       0.35      0.45      0.40       100
         21       0.27      0.43      0.33       100
         22       0.27      0.18      0.22       100
         23       0.30      0.46      0.37       100
         24       0.49      0.31      0.38       100
         25       0.14      0.10      0.11       100
         26       0.17      0.11      0.13       100
         27       0.06      0.29      0.09       100
         28       0.32      0.37      0.34       100
         29       0.12      0.21      0.15       100
         30       0.50      0.13      0.21       100
         31       0.24      0.04      0.07       100
         32       0.29      0.19      0.23       100
         33       0.18      0.28      0.22       100
         34       0.17      0.03      0.05       100
         35       0.17      0.07      0.10       100
         36       0.21      0.19      0.20       100
         37       0.24      0.06      0.10       100
         38       0.17      0.06      0.09       100
         39       0.12      0.07      0.09       100
         40       0.26      0.23      0.24       100
         41       0.62      0.45      0.52       100
         42       0.10      0.05      0.07       100
         43       0.09      0.44      0.16       100
         44       0.10      0.12      0.11       100
         45       0.20      0.03      0.05       100
         46       0.22      0.19      0.20       100
         47       0.37      0.19      0.25       100
         48       0.14      0.48      0.22       100
         49       0.38      0.11      0.17       100
         50       0.14      0.05      0.07       100
         51       0.16      0.15      0.16       100
         52       0.43      0.60      0.50       100
         53       0.27      0.61      0.37       100
         54       0.48      0.26      0.34       100
         55       0.07      0.01      0.02       100
         56       0.45      0.13      0.20       100
         57       0.10      0.42      0.16       100
         58       0.35      0.17      0.23       100
         59       0.13      0.36      0.19       100
         60       0.40      0.65      0.50       100
         61       0.42      0.34      0.38       100
         62       0.25      0.49      0.33       100
         63       0.31      0.21      0.25       100
         64       0.14      0.03      0.05       100
         65       0.13      0.02      0.03       100
         66       0.00      0.00      0.00       100
         67       0.20      0.35      0.25       100
         68       0.24      0.66      0.35       100
         69       0.26      0.30      0.28       100
         70       0.37      0.22      0.28       100
         71       0.37      0.46      0.41       100
         72       0.11      0.01      0.02       100
         73       0.22      0.22      0.22       100
         74       0.09      0.06      0.07       100
         75       0.27      0.28      0.27       100
         76       0.29      0.38      0.33       100
         77       0.20      0.01      0.02       100
         78       0.19      0.03      0.05       100
         79       0.25      0.02      0.04       100
         80       0.14      0.02      0.04       100
         81       0.13      0.02      0.03       100
         82       0.59      0.50      0.54       100
         83       0.14      0.15      0.14       100
         84       0.18      0.06      0.09       100
         85       0.20      0.52      0.28       100
         86       0.31      0.23      0.26       100
         87       0.21      0.27      0.23       100
         88       0.07      0.02      0.03       100
         89       0.16      0.44      0.24       100
         90       0.20      0.03      0.05       100
         91       0.30      0.34      0.32       100
         92       0.20      0.10      0.13       100
         93       0.18      0.17      0.17       100
         94       0.46      0.25      0.32       100
         95       0.23      0.41      0.29       100
         96       0.24      0.17      0.20       100
         97       0.10      0.16      0.12       100
         98       0.09      0.13      0.11       100
         99       0.39      0.15      0.22       100

avg / total       0.24      0.22      0.20     10000

ROC 曲线

ROC 曲线由二元分类器使用，因为它是一个很好的工具，可以查看真阳性率与假阳性率。这是一种用于评估二进制分类器性能的图形化方法。

我们将为多类分类编写 ROC 曲线。此代码来自 DloLogy，但您可以访问 Scikit Learn 文档页面。

from sklearn.datasets import make_classification
from sklearn.preprocessing import label_binarize
from scipy import interp
from itertools import cycle

n_classes = 100

from sklearn.metrics import roc_curve, auc

# Plot linewidth.
lw = 2

# Compute ROC curve and ROC area for each class
fpr = dict()
tpr = dict()
roc_auc = dict()
for i in range(n_classes):
    fpr[i], tpr[i], _ = roc_curve(y_test[:, i], snn_pred[:, i])
    roc_auc[i] = auc(fpr[i], tpr[i])

# Compute micro-average ROC curve and ROC area
fpr["micro"], tpr["micro"], _ = roc_curve(y_test.ravel(), snn_pred.ravel())
roc_auc["micro"] = auc(fpr["micro"], tpr["micro"])

# Compute macro-average ROC curve and ROC area

# First aggregate all false positive rates
all_fpr = np.unique(np.concatenate([fpr[i] for i in range(n_classes)]))

# Then interpolate all ROC curves at this points
mean_tpr = np.zeros_like(all_fpr)
for i in range(n_classes):
    mean_tpr += interp(all_fpr, fpr[i], tpr[i])

# Finally average it and compute AUC
mean_tpr /= n_classes

fpr["macro"] = all_fpr
tpr["macro"] = mean_tpr
roc_auc["macro"] = auc(fpr["macro"], tpr["macro"])

# Plot all ROC curves
plt.figure(1)
plt.plot(fpr["micro"], tpr["micro"],
         label='micro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["micro"]),
         color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],
         label='macro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["macro"]),
         color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue'])
for i, color in zip(range(n_classes-97), colors):
    plt.plot(fpr[i], tpr[i], color=color, lw=lw,
             label='ROC curve of class {0} (area = {1:0.2f})'
             ''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Some extension of Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()

# Zoom in view of the upper left corner.
plt.figure(2)
plt.xlim(0, 0.2)
plt.ylim(0.8, 1)
plt.plot(fpr["micro"], tpr["micro"],
         label='micro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["micro"]),
         color='deeppink', linestyle=':', linewidth=4)

plt.plot(fpr["macro"], tpr["macro"],
         label='macro-average ROC curve (area = {0:0.2f})'
               ''.format(roc_auc["macro"]),
         color='navy', linestyle=':', linewidth=4)

colors = cycle(['aqua', 'darkorange', 'cornflowerblue'])
for i, color in zip(range(3), colors):
    plt.plot(fpr[i], tpr[i], color=color, lw=lw,
             label='ROC curve of class {0} (area = {1:0.2f})'
             ''.format(i, roc_auc[i]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Some extension of Receiver operating characteristic to multi-class')
plt.legend(loc="lower right")
plt.show()

最后，我们将保存训练历史数据。

#Histórico
with open(path_base + '/simplenn_history.txt', 'wb') as file_pi:
  pickle.dump(snn.history, file_pi)

关注点

尽管使用此模型训练 10 个 epoch 已经足够好，但从准确率和损失图表中我们可以看到，通过增加 epoch，模型不会有太大改进。ROC 曲线具有良好的真阳性率与假阳性率（这意味着在预测一个类标签时，其成为假阳性的概率很低）。但是，对于准确率、召回率和精确率，该比率非常低。

在下一章中，我们将使用与本章相同的度量、损失和优化函数，对相同的数据集使用一个非常简单的卷积神经网络进行训练。再见！

历史

2018 年 5 月 20 日：初始版本