为 COVID-19 诊断准备深度学习环境

Abdulkader Helwan

5.00/5 (3投票s)

2021 年 2 月 16 日

CPOL

3分钟阅读

9634

137

在本文中，我们将讨论该项目的材料和方法。

下载源代码 - 300.4 KB

在本系列文章中，我们将应用一个深度学习（DL）网络ResNet50，用于诊断胸部X光图像中的Covid-19。我们将使用Python的TensorFlow库在Jupyter Notebook上训练神经网络。

您需要用于此项目的工具和库是

IDE

Jupyter Notebook

库

我们假设您熟悉使用 Python 和 Jupyter Notebook 进行深度学习。如果您是 Python 新手，请从本教程开始。如果您还不熟悉 Jupyter，请从这里开始。

在上一篇文章中，我们介绍了迁移学习和 ResNet50。在本文中，除了安装 TensorFlow 和启动网络训练所需的其他库之外，我们还将讨论用于训练 ResNet50 的数据集。

安装 TensorFlow 和其他库

在这个项目中，我们将在 Jupyter Notebook 上使用 Python 3.7。我们将使用 TensorFlow 2.0 作为 DL 库来构建我们的模型。要安装 TensorFlow，请打开 Anaconda 并运行以下 GPU CUDA 命令

conda create -n tf-gpu-cuda8 tensorflow-gpu cudatoolkit=10.0
conda activate tf-gpu-cuda8

要检查 TensorFlow 是否安装正确，请打开 Jupyter Notebook 并输入

Import Tensorflow as tf

如果您没有收到任何错误，则 TensorFlow 安装正确。

现在我们需要安装一些基本库，例如 NumPy 和 Matplotlib。打开 Anaconda 并输入以下内容

conda install numpy
conda install -c conda-forge matplotlib

打开你的 Jupyter Notebook，添加这两个命令，并确保它们没有产生任何错误。

Import numpy as np
Import matplotlib.pyplot as plt

一旦我们安装了所有必需的库，我们将它们与我们将在本项目中使用的一些附加包一起导入

# Import required libraries
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import random
from keras.applications.imagenet_utils import preprocess_input
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

from keras.models import Model
from keras.applications import ResNet50
from keras.preprocessing.image import ImageDataGenerator

数据集

在开始我们的网络编码之前，我们需要一组图像来训练和测试网络。在本项目中，我们将使用Covid-19 胸部 X 光图像的公共数据集。该数据集包含三个类别的图像：Covid-19、Normal 和 Pneumonia。我们的目标是对 Covid-19“阳性”和“阴性”图像进行分类；为此，我们只需要 Covid-19 和 Normal 类。因此，下载数据集后，我们从中删除了 Pneumonia 类。该数据集包含 1,143 张 COVID-19 阳性图像和 1,341 张正常图像，冠状病毒阴性。

应下载并预处理图像以适应网络的输入格式 – 调整为 224x224x3。您可以使用 TensorFlow 的 ImageDataGenerator 加载和调整图像大小。

加载预训练的 ResNet50 模型

首先，我们需要加载预训练的模型并冻结其权重。在我们的项目中，我们将使用 ResNet50 作为 Keras 内置神经网络模型中的预定义网络架构，其中包括 ResNet、Inception、GoogleNet 等。

由于我们想使用迁移学习而不是从头开始，因此我们要求 Keras 加载一个已经在 ImageNet 图像上训练过的 ResNet 50 的副本。选项include_top=False 允许通过删除最后的密集层来进行特征提取。这有助于我们控制模型的输出和输入。

model = tf.keras.applications.ResNet50(weights='imagenet')
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top = False)
print(base_model.summary())

图 3：ResNet50 基本模型的快照

然后我们可以显示网络层的名称和编号，以便可以在以后的阶段中轻松地将它们设置为可训练的。

for i, layer in enumerate(base_model.layers):
  print(i, layer.name)

使用 ImageDataGenerator 加载数据

TensorFlow 和 Keras 提供了一种使用 ImageDataGenerator 轻松加载数据的方法。此函数允许您预处理您的数据 – 调整大小、重新缩放和随机排序 – 全部在一个操作中。

首先，我们从预训练的 ResNet50 模型调用预处理函数。

train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(preprocessing_function=tf.keras.applications.resnet50.preprocess_input)

接下来，我们将分批从我们的项目目录中收集训练和测试图像，并将它们分别存储在 train_datagen 和 test_datagen 目录中。

train_datagen = ImageDataGenerator(preprocessing_function = preprocess_input)
test_datagen = ImageDataGenerator(preprocessing_function = preprocess_input)
train_generator = train_datagen.flow_from_directory(r'C:\Users\abdul\Desktop\Research\Covid=19\COVDATA\train', 
                                                   target_size = (224, 224),
                                                   color_mode = 'rgb',
                                                   batch_size = 3,
                                                   class_mode = 'binary',
                                                   shuffle = True)
test_generator = test_datagen.flow_from_directory(r'C:\Users\abdul\Desktop\Research\Covid=19\COVDATA\test', 
                                                   target_size = (224, 224),
                                                   color_mode = 'rgb',
                                                   batch_size = 3,
                                                   class_mode = 'binary',
                                                   shuffle = True)

请注意，上面的函数包含 One-hot 编码，用于标记我们在本项目中拥有的两个类别：Covid-19 和 Normal。要检查图像的标签，请键入

train_datagen.label

正如您在代码中看到的，我们将图像大小调整为 224x224x3 以适应 ResNet50 的输入格式。我们使用二元类模式，因为我们的分类任务是一个二元任务；它只处理两个类。

然后我们可以可视化一些将用于训练网络的数据图像。我们可以使用 OpenCV 逐个显示图像，如本例所示

imageformat=".png"
path=r'C:\Users\abdul\Desktop\ContentLab\ContentLab[Abdulkader_Helwan]\test2\COVID-19'
imfilelist=[os.path.join(path,f) for f in os.listdir(path) if f.endswith(imageformat)]
for el in imfilelist:
        print(el)
        image = cv2.imread(el, cv2.IMREAD_COLOR)
        cv2.imshow('Image', image) #Show the image
        cv2.waitKey(1000)

这将连续显示图像，如图 4 所示

图 4：使用 cv2 读取和显示所有图像

下一步

在下一篇文章中，我们将致力于重组 ResNet50 以执行新的分类任务。敬请关注！