猎豹优化器 - Python 实现

Ujwal Watgule

5.00/5 (1投票)

2023年10月1日

CPOL

8分钟阅读

10953

301

猎豹优化器是一种受自然启发的元启发式算法，旨在处理多维复杂优化问题。

下载源代码 - 33.2 KB

引言

猎豹优化器已在 MATLAB 中实现，可在 MATLAB Central File Exchange 上找到。它提供了一系列功能，并经过精心设计，可用于处理多个领域的复杂优化问题。

最近，在我从事一个机器学习项目时，我想提高 SVM 模型的性能。（要理解 SVM 的基本概念，您可以阅读我在 DEV Community 上发布的入门文章理解 SVM）。我正在寻找最新的元启发式算法来提高 SVM 模型的性能。我在 MATLAB Central 上发现了猎豹优化器。但考虑到我的项目是 Python 语言，并且没有 Python 版本，所以我用 Python 实现了它。

您可以参考我的 Medium 文章来理解元启发式算法的基本概念。

在本文中，我将演示在 Python 中实现猎豹优化器。您可以选择任何您喜欢的数据集。为简单起见，我选择了糖尿病数据集。

背景

在优化和机器学习中，各种优化算法和技术被用于找到给定问题的最佳解决方案。常见的优化算法包括梯度下降、遗传算法、粒子群优化等。这些算法应用于诸如训练机器学习模型、查找参数的最佳配置以及解决各个领域的复杂优化问题等任务。

您可以在此处阅读有关猎豹优化器的更多信息。

Using the Code

下面是在 Python 中实现猎豹优化器的分步指南。

1. 导入所需的库

import numpy as np
import pandas as pd
from sklearn.svm import SVC  # for Support Vector Classifier
from sklearn.metrics import accuracy_score

你们中的许多人可能已经知道了上面导入的库。然而，对于那些不熟悉的人，您可以参考以下解释。

import pandas as pd：此行导入 Pandas 库并为其指定别名“pd”。Pandas 是 Python 中强大的数据操作和分析库，通常用于数据科学和机器学习任务中处理结构化数据。

from sklearn.svm import SVC：这里，我们从 Scikit-Learn 库导入 SVC（支持向量分类器）类。Scikit-Learn 是一个流行的 Python 机器学习库，提供各种用于分类、回归、聚类等任务的工具。SVC 是一个类，允许您创建和训练用于分类的支持向量机（SVM）模型。

import numpy as np：此行导入 NumPy 库并为其指定别名“np”。NumPy 是 Python 中另一个用于数值和数学运算的基础库。它提供了处理数组和矩阵的支持，这些通常在机器学习中用于数据操作和数学运算。

from sklearn.metrics import accuracy_score：此行从 Scikit-Learn 的 metrics 模块导入 accuracy_score 函数。accuracy_score 用于通过将分类模型的预测与其真实标签进行比较来计算其准确性。它是评估分类模型性能的常用指标。

2. 读取数据集

data = pd.read_csv("diabetesdataset.csv") # change dataset path with your local path
X = data.iloc[:, 2:-1].values # Features 
y = data.iloc[:, 1].values # Labels

X = data.iloc[:, 2:-1].values：此行从 DataFrame 数据中提取特征，并将它们存储在名为 X 的 NumPy 数组中。它使用 .iloc 从 DataFrame 中选择特定列。在这种情况下，它选择所有行（由 : 表示），以及从第 3 列（索引为 2）到最后一列（不包括最后一列，由 -1 表示）的列。这些选定的列被视为机器学习模型的特征。

y = data.iloc[:, 1].values：此行从 DataFrame 数据中提取标签，并将它们存储在名为 y 的 NumPy 数组中。它选择 DataFrame 的第 2 列（索引为 1）的所有行（由 : 表示）。这些选定的值被视为机器学习模型的标签或目标变量。

3. 设置目标变量

y = data['CLASS']

在机器学习任务中，我们有一个包含多个列的 DataFrame，其中一列代表目标变量（我们想要预测的内容），而其他列代表特征（用于进行预测的属性）。在本例中，y 被设置为标签（“CLASS”列），通常在训练和评估机器学习模型（在本例中为 SVC）时用作目标变量。

4. 划分数据

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

from sklearn.model_selection import train_test_split：此行从 Scikit-Learn 的 model_selection 模块导入 train_test_split 函数。此函数通常用于将数据集划分为训练集和测试集，以评估机器学习模型的性能。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)：此行通过采用 X 和 y 参数来将数据集划分为四个子集，其中 X：这是包含特征数据的数组。y：这是包含目标变量的数组。

train_test_split 函数接受这些数组作为输入并返回四个子集：

X_train：这是用于训练机器学习模型的特征子集。
X_test：这是用于测试机器学习模型的特征子集。
y_train：这是与 X_train 对应的标签子集，用于训练模型。
y_test：这是与 X_test 对应的标签子集，用于评估模型的性能。

test_size=0.2：这指定 20% 的数据将保留用于测试，其余 80% 将用于训练。

random_state=42：这设置了一个用于可重复性的随机种子。设置 random_state 可确保每次运行代码时都会生成相同的随机划分。您可以使用任何整数值作为 random_state。

5. 定义目标函数

def objective_function(params):
    # Ensure params contains two values
    if len(params) != 2:
        raise ValueError("params must contain two values: C and gamma")    
    C, gamma = params    

    # Create and fit an SVM model with the parameters
    model = SVC(C=C, gamma=gamma)
    model.fit(X_train, y_train)
    
    # Predict the test labels and compute the accuracy score
    y_pred = model.predict(X_test)
    score = accuracy_score(y_test, y_pred)
    
    # Return the negative score as we want to minimize it
    return -score

此目标函数用于超参数调优任务。在本例中，我们希望优化 SVM 模型的超参数 C 和 gamma。目标是找到 C 和 gamma 的值，这些值可以最大化测试数据的准确性，或者等效地，最小化负准确性得分。

6. 初始化猎豹

# Define the bounds of the parameters as a numpy array
bounds = np.array([[0.01, 100], [0.0001, 10]])
n_cheetahs = 30 
n_iterations = 50
# Initialize the cheetah positions randomly within the bounds
cheetahs = np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_cheetahs, len(bounds)))
# Initialize the best position and score
best_position = None 
best_score = -np.inf

cheetahs 是一个形状为 (n_cheetahs, len(bounds)) 的 NumPy 数组。它在指定的参数范围内随机初始化“cheetahs”的位置。

np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_cheetahs, len(bounds))) 在指定的范围内为每个“cheetah”的位置生成随机值。数组中的每一行代表一个“cheetah”，每一列代表一个参数。

best_position 最初设置为 None，表示迄今为止找到的最佳位置未知。

best_score 最初设置为负无穷大 (-np.inf)，以确保在优化过程中找到的任何初始解决方案都将被视为改进。目标是最大化分数，因此将其初始化为负无穷大可确保在找到更好的解决方案之前，找到的第一个解决方案将被视为最佳解决方案。

7. 设置常数

# The probability of choosing searching or sitting-and-waiting strategy 
alpha = 0.5 
# The probability of choosing attacking strategy 
beta = 0.5 
# The probability of leaving the prey and going back home
delta = 0.5

8. 执行优化函数

# Run the optimization loop
for i in range(n_iterations): 
    # Evaluate the objective function for each cheetah 
    scores = np.array([objective_function(cheetah) for cheetah in cheetahs])

此循环在指定的迭代次数 (n_iterations) 内反复评估优化过程中每个“cheetah”的目标函数。每次迭代中每个“cheetah”获得的分数将用于指导优化算法更新“cheetahs”的位置，并在优化过程中可能找到更好的解决方案。

# Update the best position and score if needed
if np.max(scores) > best_score:
    best_score = np.max(scores)
    best_position = cheetahs[np.argmax(scores)]

它会检查当前迭代中获得的最大分数 (np.max(scores)) 是否大于之前的最佳分数 (best_score)。如果是，则将 best_score 更新为新的最大分数，并将 best_position 更新为在此迭代中获得最佳分数的“cheetah”的位置。

# Print the current iteration and best score
print(f'Iteration {i+1}: Best score = {-best_score}')

# Create a new array to store the updated positions
new_cheetahs = np.zeros_like(cheetahs)

# Loop through each cheetah
for j in range(n_cheetahs):
    # Generate a random number between 0 and 1
    r = np.random.rand()
    
    # If r < alpha, choose searching or sitting-and-waiting strategy
    if r < alpha:
        # Generate another random number between 0 and 1
        s = np.random.rand()
        
        # If s < 0.5, choose searching strategy
        if s < 0.5:
            # Choose a random function from a pool of functions
            f = np.random.choice([np.sin, np.cos, np.tan])
            
            # Update the position by following a leader using the function
            leader = cheetahs[np.argmax(scores)]
            new_cheetahs[j] = leader + f((i+1) / n_iterations) * (leader - cheetahs[j])
        
        # Else, choose sitting-and-waiting strategy
        else:
            # Update the position by adding a random perturbation
            new_cheetahs[j] = cheetahs[j] + 
                              np.random.uniform(-0.01, 0.01, size=len(bounds))
    
    # Else if r < alpha + beta, choose attacking strategy
    elif r < alpha + beta:
        # Update the position by moving towards the best position with a random factor
        new_cheetahs[j] = cheetahs[j] + np.random.rand() * 
                          (best_position - cheetahs[j])
    
    # Else, choose leaving-the-prey-and-going-back-home strategy
    else:
        # Update the position by moving back to the initial position 
        # with some random perturbation
        new_cheetahs[j] = cheetahs[0] + 
                          np.random.uniform(-delta, delta, size=len(bounds))

根据随机概率（alpha 和 beta）选择策略：它生成一个介于 0 和 1 之间的随机数 r。根据 r 的值，它选择三种策略之一：

搜索或坐等策略：如果 r < alpha，则进一步生成一个介于 0 和 1 之间的随机数 s。如果 s < 0.5，则选择搜索策略，并使用应用于领导者位置的随机数学函数更新位置。否则，它选择坐等策略。

攻击策略：如果 alpha <= r < alpha + beta，则选择攻击策略，并通过随机因子向迄今为止找到的最佳位置 (best_position) 移动来更新位置。

离开猎物，返回家园策略：如果 r >= alpha + beta，则选择离开猎物，返回家园的策略。它通过在随机的小值作用下返回到初始位置（第一个“cheetah”的位置）来更新位置。

# Clip the position to respect the bounds

new_cheetahs[j] = np.clip(new_cheetahs[j], bounds[:, 0], bounds[:, 1])

更新位置后，它会剪切位置以确保它们保留在指定的参数范围内（bounds[:, 0] 和 bounds[:, 1]）。

# Replace the old positions with the new ones
cheetahs = new_cheetahs

最后，它使用存储在 new_cheetahs 数组中的新计算出的位置替换“cheetahs”的旧位置。

9. 查找最佳位置和分数

# Print the final best position and score
print(f'Final best position: C = {best_position[0]}, gamma = {best_position[1]}')
print(f'Final best score: {-best_score}')

cValue = best_position[0] 
gammaValue = best_position[1]

输出

10. 训练 SVM 模型

# Import the SVC class from sklearn.svm
from sklearn.svm import SVC
# Create an SVM model with the optimal C and gamma values
model = SVC(C = cValue, gamma = gammaValue)

此行创建一个 SVC 类的实例，并使用特定超参数对其进行配置：

C：正则化参数 cValue 用于控制最大化裕度（大 C）和最小化分类错误（小 C）之间的权衡。

gamma：核系数 gammaValue 决定了决策边界的形状。小的 gamma 导致更灵活的决策边界，而大的 gamma 使其更僵化。

# Fit the model on the training data
model.fit(X_train, y_train)

此行在训练数据上训练（拟合）SVM 模型。训练数据由存储在 X_train 中的特征向量及其对应的标签存储在 y_train 中组成。fit 方法调整模型的参数，以学习一个能够最好地区分训练数据中类别的决策边界。

输出

11. 检查模型准确性

# Import the accuracy_score function from sklearn.metrics
from sklearn.metrics import accuracy_score

# Predict the labels of the test data
y_pred = model.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)

# Print the accuracy
print(f"The accuracy of the model is {accuracy:.2f}")

输出

结论

至此，我们已经在 Python 中实现了猎豹优化器算法。通过猎豹优化器，我们找到了 SVM 机器学习模型进行有效和准确的结果预测所需的 C 和 gamma 参数的最佳或最优值。

参考文献

您可以通过以下文章了解更多关于猎豹优化器的信息：

猎豹优化器 (CO) - https://optim-app.com/projects/co
猎豹优化器 - Seyedali Mirjalili (2023)。猎豹优化器 (https://www.mathworks.com/matlabcentral/fileexchange/130404-cheetah-optimizer)，MATLAB Central File Exchange。检索于 2023年9月30日。

您可以在此处下载源代码。

历史

2023年9月30日：初始版本