dtop：用于测量应用程序系统利用率和系统性能的工具

Calinyara

5.00/5 (1投票)

2020年5月5日

CPOL

2分钟阅读

7392

新的系统利用率测量工具。

从 GitHub 下载源代码

引言

大多数系统利用率工具（例如，top，htop）通过计算系统计时器的中断来衡量系统负载。然而，这有时并不准确，例如以下例子：

进程 A 在第一个中断和第二个中断期间切换到进程 B。但是，中断分别被包含在进程 A 和进程 C 中。进程 B 被从统计数据中遗漏。出于同样的原因，进程 A 在第二个中断和第三个中断期间被遗漏。

背景

dtop 是一个用 Rust 编写的工具，旨在测量应用程序的系统利用率和系统性能。它通过减法方法计算系统负载。在所有系统 CPU 上执行后台占用任务。如果一些新应用程序占用了一定数量的系统计算能力，后台程序将相应地失去这些计算能力。因此，可以通过损失来评估新应用程序的系统利用率。

使用代码

以下代码片段在系统的每个 CPU 上创建一个占用线程。这些线程具有最低优先级，以便当系统有新任务运行时，可以从它们释放系统资源。

let core_ids = core_affinity::get_core_ids().unwrap();

let mut channels: Vec<(Sender<i32>, Receiver<i32>)> = Vec::with_capacity(core_num);
for _ in 0..core_num {
    channels.push(mpsc::channel());
}

let mut counters: Vec<Arc<Mutex<i64>>> = Vec::with_capacity(core_num);
for _ in 0..core_num {
    counters.push(Arc::new(Mutex::new(0)));
}

let threads_info: Vec<_> = izip!(core_ids.into_iter(),
                                 channels.into_iter(),
                                 counters.into_iter()).collect();

let handles = threads_info.into_iter().map(|info| {
    thread::spawn(move || {
        let (core_id, ch, counter) = (info.0, info.1, info.2);
        core_affinity::set_for_current(core_id);

        match set_current_thread_priority(ThreadPriority::Min) {
            Err(why) => panic!("{:?}", why),
            Ok(_) => do_measure(&counter, ch),
        }
    })
}).collect::<Vec<_>>();

这些线程真正做的是 "do_measure" 函数。在这个函数中，有一个无限循环，反复计算一个素数，并检查是否收到退出信号。

fn do_measure(c: &Arc<Mutex<i64>>, ch: (Sender<i32>, Receiver<i32>)) -> bool {
    loop {
        let r: bool = is_prime(PRIME);

        let mut num = c.lock().unwrap();
        *num += 1;

        match (ch.1).try_recv() {
            Ok(_) | Err(TryRecvError::Disconnected) => {
                break r;
            },
            Err(TryRecvError::Empty) => {},
        }
    }
}

“is_prime” 的运行次数可以被视为系统性能得分。该分数代表剩余的系统性能。如果系统上没有正在运行的工作负载，则它代表整个系统性能。可以比较不同系统的性能得分。较高的分数意味着更好的性能。

fn is_prime(n: u64) -> bool {
    for a in 2..n {
        if n % a == 0 {
            return false;
        }
    }
    true
}

以下代码片段定期统计系统性能。间隔可以配置为 X 秒。分数将每隔一段时间动态打印出来。系统利用率计算为 "(calibration_scores - total_score as f64) / calibration_scores * 100." 它可以是一个负数。这意味着工作负载比上次校准系统时更少。

let when = Instant::now() + Duration::from_secs(parameter.interval as u64);
let task = Interval::new(when, Duration::from_secs(parameter.interval as u64))
    .take(run_times)
    .for_each(move |_| {
        let mut scores: Vec<i64> = vec![0; core_num];
        for i in 0..core_num {
            let mut num = counters_copy[i].lock().unwrap();
            scores[i] = *num;
            *num = 0;
        }

        for i in &mut scores {
            *i /= parameter.interval as i64;
        }

        if !parameter.calibrating {
            match File::open("scores.txt") {
                Err(_) => {
                    let total_score = scores.iter().sum::<i64>();
                    println!("Calibrating...");
                    println!("Scores per CPU: {:?}", scores);
                    println!("Total Calibrated Score: {}\n", total_score);
                    save_calibration(total_score);
                },
                Ok(_) => {
                    let total_score = scores.iter().sum::<i64>();
                    let calibration_scores: f64 = get_calibration() as f64;
                    println!("Scores per CPU: {:?}", scores);
                    match parameter.run_mode {
                        RunMode::AppUtilization => {
                            let rate = (calibration_scores - total_score as f64) /  calibration_scores * 100.;
                            println!("Total Score: {}        System Utilization: {:7.3}%\n", total_score, rate);
                        },
                        RunMode::SysPerformance => {
                            let rate = total_score as f64 / calibration_scores * 100.;
                            println!("Total Score: {}        Performance Percentage: {:9.3}%\n", total_score, rate);
                        },
                    }
                }
            }
        } else {
            let total_score = scores.iter().sum::<i64>();
            println!("Calibrating...");
            println!("Scores per CPU: {:?}", scores);
            println!("Total Calibrated Score: {}\n", total_score);
            save_calibration(total_score);
        }
        Ok(())
    })
    .map_err(|e| panic!("interval errored; err={:?}", e));

tokio::run(task);

以下是一些使用 dtop 的示例。命令行基于 clap，这是一个用于 Rust 语言的快速可配置参数解析库。

校准系统

dtop -c	        // Calibrate the system with interval 1s.
dtop -c -i 5    // Calibrate the system with interval 5s.

每 1 秒测量应用程序的系统利用率

dtop -c	    // Calibrate the system.
dtop        // Check the system utilization every 1s.
	...     // Run an application on the system.

每 5 秒测量应用程序的系统利用率

dtop -c	    // Calibrate the system.
dtop -i 5   // Check the system utilization every 5s.
	...     // Run an application on the system.

使用步进模式测量应用程序的系统利用率

dtop -c	    // Calibrate the system.
	...     // Run an application on the system.
dtop -s	    // Check the system utilization caused by the application.

测量系统性能

dtop -m 1    // Check the measuring system performance.

结论

在本文中，介绍了一种新的系统利用率工具 dtop，可用于测量应用程序的系统利用率和系统性能。它采用与传统的基于系统计时器中断的“加法”方法不同的“减法”方法。它非常准确，有效地避免了由于调度间隔小于系统时钟中断间隔而导致的统计不准确性。

源代码可以从 Github 下载：https://github.com/calinyara/dtop。

历史

2020 年 5 月 1 日：初始版本
2020 年 5 月 5 日：v2 版本