Katana Intel 性能算法 Python

Katana 的高性能图分析库

Gurbinder Gill

5.00/5 (1投票)

2021年11月16日

CPOL

6分钟阅读

6441

本文将介绍该库的功能、获取方式、使用示例以及用于突出性能的基准测试数据。

根据 Gartner, Inc. 的报告，图处理是 2021 年十大数据分析趋势之一。它是一个新兴的应用领域，也是数据科学家处理链接数据集（例如，社交、电信和金融网络；网络流量；以及生化通路）的必备工具。实际应用中的图通常很大，而且还在不断增长。例如，如今的社交网络可能有数十亿个节点和边，因此高性能并行计算至关重要。

为此，Katana Graph 与 Intel 合作，设计了一个高性能、易于使用的图分析 Python 库，该库具有以下特点：(a) 对重要的图分析算法进行了高度优化的并行实现；(b) 提供了一个高级 Python 接口，用于在底层的 C++ 图引擎之上编写自定义并行算法；(c) 与 pandas、scikit-learn 和 Apache Arrow 以及 Intel AI 软件栈中的工具和库互操作；(d) 全面支持从各种格式进行提取、转换和加载 (ETL)；以及 (e) 一个 Metagraph 插件。

本文将介绍该库的功能、获取方式、使用示例以及用于突出性能的基准测试数据。

库中的图分析算法

Katana 库中预装了图处理管道中常用的关键算法。目前可用的算法列表如下：

广度优先搜索：从源节点开始，通过广度优先搜索构建一个有向树。
单源最短路径：计算从源节点到所有节点的最短路径。
连通分量：查找图中内部相互连接但与其他分量不连接的节点组（即分量）。
PageRank：根据传入链接的结构计算图中节点的排名。
介数中心性：根据通过每个节点的最短路径数量计算图中节点的中心性。
三角形计数：计算图中三角形的数量。
Louvain 社群检测：使用 Louvain 启发式算法计算图中最大化模块度的社群。
子图提取：提取图的诱导子图。
Jaccard 相似度：计算给定节点与图中其他所有节点的 Jaccard 系数。
基于标签传播的社群检测：使用标签传播算法计算图中的社群。
局部聚类系数函数：衡量图中节点聚集在一起的程度。
K-Truss：查找图中至少包含三个顶点且每条边都与至少 K - 2 个三角形相邻的最大诱导子图。
K-Core：查找包含度为 K 或更高的节点的子图。

更多算法正在添加到库中，并且用户可以轻松添加自己的算法，如下文所示。

获取 Katana 图库

Katana Graph 的分析库是开源的，并根据 3-Clause BSD 许可证免费提供。可从 GitHub 获取，或从 Anaconda.org 轻松安装。

$ conda install -c katanagraph/label/dev -c conda-forge katana-python

使用 Katana 图库

Katana 的 Python 库支持从各种格式进行 ETL，例如邻接矩阵、pandas DataFrames、NumPy 数组、边列表、GraphML、NetworkX 等。下面显示了几个示例：

import numpy as np
import pandas
from katana.local import Graph
from katana.local.import_data import (
     from_adjacency_matrix,
     from_edge_list_arrays,
     from_edge_list_dataframe,
     from_edge_list_matrix,
     from_graphml)

来自邻接矩阵的输入

katana_graph = from_adjacency_matrix(
                    np.array([[0, 1, 0], [0, 0, 2], [3, 0, 0]]))

来自边列表的输入

katana_graph = from_edge_list_arrays(
                    np.array([0, 1, 10]), np.array([1, 2, 0]),
                    prop = np.array([1, 2, 3]))

来自 Pandas DataFrame 的输入

katana_graph = from_edge_list_dataframe(
                    pandas.DataFrame(dict(source=[0, 1, 10],
                                          destination=[1, 2, 0],
                                     prop = [1, 2, 3])))

来自 GraphML 的输入

katana_graph = from_graphml(input_file)

执行图分析算法

以下示例计算输入图的介数中心性。

import katana.local
from katana.example_utils import get_input
from katana.property_graph import PropertyGraph
from katana.analytics import betweenness_centrality,
                             BetweennessCentralityPlan,
                             BetweennessCentralityStatistics
katana.local.initialize()

property_name = "betweenness_centrality"
betweenness_centrality(katana_graph, property_name, 16,
                       BetweennessCentralityPlan.outer())
stats = BetweennessCentralityStatistics(g, property_name)

print("Min Centrality:", stats.min_centrality)
print("Max Centrality:", stats.max_centrality)
print("Average Centrality:", stats.average_centrality)

Katana 的 Python 库与 pandas、scikit-learn 和 Apache Arrow 互操作。

除了前面列出的预打包例程之外，数据科学家还可以使用简单的 Python 接口编写自己的图算法，该接口暴露了 Katana Graph 优化的 C++ 引擎¹ 及其并发数据结构和并行循环构造。Katana Graph 库已包含广度优先搜索实现，但以下示例说明了使用 API 实现此类算法的简便性。

def bfs(graph: Graph, source):
    """
    Compute the BFS distance to all nodes from source.

    The algorithm in bulk-synchronous level by level.

    :param graph: The input graph.
    :param source: The source node for the traversal.
    :return: An array of distances, indexed by node ID.
    """
    next_level_number = 0

    # The work lists for the current and next levels using a 
    # Katana concurrent data structure.
    curr_level_worklist = InsertBag[np.uint32]()
    next_level_worklist = InsertBag[np.uint32]()

    # Create and initialize the distance array.
    # source is 0, everywhere else is INFINITY
    distance = np.empty((len(graph),), dtype=np.uint32)
    distance[:] = INFINITY
    distance[source] = 0

    # Start processing with just the source node.
    next_level_worklist.push(source)
    
    # Execute until the worklist is empty.
    while not next_level_worklist.empty():
        # Swap the current and next work lists
        curr_level_worklist, next_level_worklist = next_level_worklist,
                                                   curr_level_worklist

        # Clear the worklist for the next level.
        next_level_worklist.clear()
        next_level_number += 1

        # Process the current worklist in parallel by applying
        # bfs_operator for each element of the worklist.
        do_all(
            curr_level_worklist,
            # The call here binds the initial arguments of bfs_operator.
            bfs_operator(graph, next_level_worklist,
                                next_level_number, distance)
        )

    return distance

# This function is marked as a Katana operator, meaning that it will
# be compiled to native code and prepared for use with Katana do_all.
@do_all_operator()
def bfs_operator(graph: Graph, next_level_worklist,
                               next_level_number, distance, node_id):
    """
    The operator called for each node in the work list.

    The initial four arguments are provided by bfs above.
    node_id is taken from the worklist and passed to this
    function by do_all.

    :param next_level_worklist: The work list to add next nodes to.
    :param next_level_number: The level to assign to nodes we find.
    :param distance: The distance array to fill with data.
    :param node_id: The node we are processing.
    :return:
    """
    # Iterate over the out edges of our node
    for edge_id in graph.edges(node_id):
        # Get the destination of the edge
        dst = graph.get_edge_dest(edge_id)
        
        # If the destination has not yet been reached, set its level
        # and add it to the work list so its out edges can be processed
        # in the next level.
        if distance[dst] == INFINITY:
            distance[dst] = next_level_number
            next_level_worklist.push(dst)
        # There is a race here, but it's safe. If multiple calls to
        # operator add the same destination, they will all set the
        # same level. It will create more work because the node will
        # be processed more than once in the next level, but it avoids
        # atomic operations so it can still be a win in low-degree graphs.

Metagraph 支持

Katana Graph 的 Python 分析库将通过 Metagraph 插件提供。Metagraph 为图分析提供了统一的 Python 入口点。用户可以使用标准 API 编写图工作流，然后将其分派给兼容的图库，这些库可以插入 Metagraph。现在，开源图社区将能够直接使用 Katana Graph 的高性能应用程序。Metagraph 插件包含在一个 Anaconda 包中，可以如下安装和调用：

$ conda create -n metagraph-test -c conda-forge \
                                 -c katanagraph/label/dev \
                                 -c metagraph metagraph-katana

import metagraph as mg
bfs = mg.algos.traversal.bfs_iter(katana_graph, <start node>)

Katana 图库的速度有多快？

Katana 库已针对其他图分析框架进行了广泛的基准测试，并在 GAP Benchmark Suite² 上始终表现出同等或更好的性能。**表 1** 显示了 Katana Graph 相对于 GAP 参考实现（针对来自不同领域的各种图）的性能。

表 1. 使用 GAP Benchmark Suite 测量 Katana Graph 性能。数据摘自 Azad 等人（2020）²。系统：双插槽 2.0 GHz Intel® Xeon® Platinum 8153 处理器（64 个逻辑核心）和 384 GB DDR4 内存。有关性能和基准测试结果的更完整信息，请访问 www.intel.com/benchmarks。

Katana Graph 库在处理极大的图（如 Clueweb12³ 和 WDC12⁴，分别拥有 420 亿和 1280 亿条边，是目前最大的公开可用图之一）方面也表现出色，尤其是在使用 Intel® OptaneTM DC 持久内存⁵, ⁶ 等最新的字节可寻址内存技术时（**图 1**）。

图 1. Katana Graph 在极大数据图上的 BFS 性能。将单台 Intel Optane 内存节点与多节点集群进行性能比较。每个 TACC Stampede Skylake 集群节点有两个 2.1 GHz Intel Xeon Platinum 8160 处理器和 192 GB DDR4 内存。Cascade Lake 服务器有两个 2.2 GHz 第二代 Intel Xeon 可扩展处理器，配备 6 TB Intel Optane PMM 和 384 GB DDR4 DRAM。Ice Lake 服务器有两个 2.2 GHz Intel Xeon Platinum 8352Y 处理器，配备 8 TB Intel Optane PMM 和 1 TB DDR4 DRAM。此图摘自参考文献 [5] 和 [6] 的数据。有关性能和基准测试结果的更完整信息，请访问 www.intel.com/benchmarks。

在哪里可以了解更多信息？

希望您已经确信 Katana Graph 库是图分析的一个多功能、高性能的选择。您可以在 GitHub 站点了解有关该库的更多信息、提问、发布功能请求等。

参考文献

Nguyen, D., Lenharth, A., and Pingali, K. (2013). A lightweight infrastructure for graph analytics. Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13).
Azad, A. et al. (2020). Evaluation of graph analytics frameworks using the GAP Benchmark Suite, IEEE International Symposium on Workload Characterization (IISWC).
ClueWeb12 数据集
Web Data Commons – 超链接图
Gill, G., Dathathri, R., Hoang, L., Peri, R., and Pingali, K. (2020) Single machine graph analytics on massive datasets using Intel Optane DC Persistent Memory. Proceedings of the VLDB Endowment, 13(8), 1304– 1318.
Dathathri, R. et al. (2019). Gluon-Async: A bulk-asynchronous system for distributed and heterogeneous graph analytics. Proceedings of the 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).