snakemake中基准变量的含义

Meaning of the benchmark variables in snakemake

我在我的 snakemake 工作流程中包含了一些规则的 benchmark 指令,生成的文件具有以下 header:

s   h:m:s   max_rss max_vms max_uss max_pss io_in   io_out  mean_load

The only documentation I've found 提到一个 "benchmark txt file (which will contain a tab-separated table of run times and memory usage in MiB)".

我猜第 1 列和第 2 列是显示执行规则所用时间的两种不同方式(以秒为单位,并转换为小时、分钟和秒)。

io_inio_out可能与磁盘读写有关activity,但它们是用什么单位测量的?

其他的是什么?这在某处记录了吗?

编辑:查看源代码

我在 /snakemake/benchmark.py 中找到了以下代码,这很可能是基准数据的来源:

def _update_record(self):
    """Perform the actual measurement"""
    # Memory measurements
    rss, vms, uss, pss = 0, 0, 0, 0
    # I/O measurements
    io_in, io_out = 0, 0
    # CPU seconds
    cpu_seconds = 0
    # Iterate over process and all children
    try:
        main = psutil.Process(self.pid)
        this_time = time.time()
        for proc in chain((main,), main.children(recursive=True)):
            meminfo = proc.memory_full_info()
            rss += meminfo.rss
            vms += meminfo.vms
            uss += meminfo.uss
            pss += meminfo.pss
            ioinfo = proc.io_counters()
            io_in += ioinfo.read_bytes
            io_out += ioinfo.write_bytes
            if self.bench_record.prev_time:
                cpu_seconds += proc.cpu_percent() / 100 * (
                    this_time - self.bench_record.prev_time)
        self.bench_record.prev_time = this_time
        if not self.bench_record.first_time:
            self.bench_record.prev_time = this_time
        rss /= 1024 * 1024
        vms /= 1024 * 1024
        uss /= 1024 * 1024
        pss /= 1024 * 1024
        io_in /= 1024 * 1024
        io_out /= 1024 * 1024
    except psutil.Error as e:
        return
    # Update benchmark record's RSS and VMS
    self.bench_record.max_rss = max(self.bench_record.max_rss or 0, rss)
    self.bench_record.max_vms = max(self.bench_record.max_vms or 0, vms)
    self.bench_record.max_uss = max(self.bench_record.max_uss or 0, uss)
    self.bench_record.max_pss = max(self.bench_record.max_pss or 0, pss)
    self.bench_record.io_in = io_in
    self.bench_record.io_out = io_out
    self.bench_record.cpu_seconds += cpu_seconds

所以这似乎来自 psutil 提供的功能。

当然可以更好地记录 snakemake 中的基准测试,但记录了 psutil here:

get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps. 
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.

psutil.disk_io_counters(perdisk=False)

Return system disk I/O statistics as a namedtuple including the following attributes:
    read_count: number of reads
    write_count: number of writes
    read_bytes: number of bytes read
    write_bytes: number of bytes written
    read_time: time spent reading from disk (in milliseconds)
    write_time: time spent writing to disk (in milliseconds)

您找到的代码确认所有内存使用和 IO 计数均以 MB(= 字节 * 1024 * 1024)为单位报告。

我就把这个留在这里以供将来参考。

通读

如前所述:

colname type (unit) description
s float (seconds) Running time in seconds
h:m:s string (-) Running time in hour, minutes, seconds format
max_rss float (MB) Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms float (MB) Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process
max_uss float (MB) “Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss float (MB) “Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)
io_in float (MB) the number of MB read (cumulative).
io_out float (MB) the number of MB written (cumulative).
mean_load float (-) CPU usage over time, divided by the total running time (first row)
cpu_time float(-) CPU time summed for user and system