snakemake中基准变量的含义
Meaning of the benchmark variables in snakemake
我在我的 snakemake 工作流程中包含了一些规则的 benchmark
指令,生成的文件具有以下 header:
s h:m:s max_rss max_vms max_uss max_pss io_in io_out mean_load
The only documentation I've found 提到一个 "benchmark txt file (which will contain a tab-separated table of run times and memory usage in MiB)".
我猜第 1 列和第 2 列是显示执行规则所用时间的两种不同方式(以秒为单位,并转换为小时、分钟和秒)。
io_in
和io_out
可能与磁盘读写有关activity,但它们是用什么单位测量的?
其他的是什么?这在某处记录了吗?
编辑:查看源代码
我在 /snakemake/benchmark.py
中找到了以下代码,这很可能是基准数据的来源:
def _update_record(self):
"""Perform the actual measurement"""
# Memory measurements
rss, vms, uss, pss = 0, 0, 0, 0
# I/O measurements
io_in, io_out = 0, 0
# CPU seconds
cpu_seconds = 0
# Iterate over process and all children
try:
main = psutil.Process(self.pid)
this_time = time.time()
for proc in chain((main,), main.children(recursive=True)):
meminfo = proc.memory_full_info()
rss += meminfo.rss
vms += meminfo.vms
uss += meminfo.uss
pss += meminfo.pss
ioinfo = proc.io_counters()
io_in += ioinfo.read_bytes
io_out += ioinfo.write_bytes
if self.bench_record.prev_time:
cpu_seconds += proc.cpu_percent() / 100 * (
this_time - self.bench_record.prev_time)
self.bench_record.prev_time = this_time
if not self.bench_record.first_time:
self.bench_record.prev_time = this_time
rss /= 1024 * 1024
vms /= 1024 * 1024
uss /= 1024 * 1024
pss /= 1024 * 1024
io_in /= 1024 * 1024
io_out /= 1024 * 1024
except psutil.Error as e:
return
# Update benchmark record's RSS and VMS
self.bench_record.max_rss = max(self.bench_record.max_rss or 0, rss)
self.bench_record.max_vms = max(self.bench_record.max_vms or 0, vms)
self.bench_record.max_uss = max(self.bench_record.max_uss or 0, uss)
self.bench_record.max_pss = max(self.bench_record.max_pss or 0, pss)
self.bench_record.io_in = io_in
self.bench_record.io_out = io_out
self.bench_record.cpu_seconds += cpu_seconds
所以这似乎来自 psutil
提供的功能。
当然可以更好地记录 snakemake 中的基准测试,但记录了 psutil here:
get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps.
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.
psutil.disk_io_counters(perdisk=False)
Return system disk I/O statistics as a namedtuple including the following attributes:
read_count: number of reads
write_count: number of writes
read_bytes: number of bytes read
write_bytes: number of bytes written
read_time: time spent reading from disk (in milliseconds)
write_time: time spent writing to disk (in milliseconds)
您找到的代码确认所有内存使用和 IO 计数均以 MB(= 字节 * 1024 * 1024)为单位报告。
我就把这个留在这里以供将来参考。
通读
snakemake >= 6.0.0
benchmark module
psutil
的memory_info(), memory_full_info(), io_counters(), cpu_times()
如前所述:
colname
type (unit)
description
s
float (seconds)
Running time in seconds
h:m:s
string (-)
Running time in hour, minutes, seconds format
max_rss
float (MB)
Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used.
max_vms
float (MB)
Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process
max_uss
float (MB)
“Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now.
max_pss
float (MB)
“Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only)
io_in
float (MB)
the number of MB read (cumulative).
io_out
float (MB)
the number of MB written (cumulative).
mean_load
float (-)
CPU usage over time, divided by the total running time (first row)
cpu_time
float(-)
CPU time summed for user and system
我在我的 snakemake 工作流程中包含了一些规则的 benchmark
指令,生成的文件具有以下 header:
s h:m:s max_rss max_vms max_uss max_pss io_in io_out mean_load
The only documentation I've found 提到一个 "benchmark txt file (which will contain a tab-separated table of run times and memory usage in MiB)".
我猜第 1 列和第 2 列是显示执行规则所用时间的两种不同方式(以秒为单位,并转换为小时、分钟和秒)。
io_in
和io_out
可能与磁盘读写有关activity,但它们是用什么单位测量的?
其他的是什么?这在某处记录了吗?
编辑:查看源代码
我在 /snakemake/benchmark.py
中找到了以下代码,这很可能是基准数据的来源:
def _update_record(self):
"""Perform the actual measurement"""
# Memory measurements
rss, vms, uss, pss = 0, 0, 0, 0
# I/O measurements
io_in, io_out = 0, 0
# CPU seconds
cpu_seconds = 0
# Iterate over process and all children
try:
main = psutil.Process(self.pid)
this_time = time.time()
for proc in chain((main,), main.children(recursive=True)):
meminfo = proc.memory_full_info()
rss += meminfo.rss
vms += meminfo.vms
uss += meminfo.uss
pss += meminfo.pss
ioinfo = proc.io_counters()
io_in += ioinfo.read_bytes
io_out += ioinfo.write_bytes
if self.bench_record.prev_time:
cpu_seconds += proc.cpu_percent() / 100 * (
this_time - self.bench_record.prev_time)
self.bench_record.prev_time = this_time
if not self.bench_record.first_time:
self.bench_record.prev_time = this_time
rss /= 1024 * 1024
vms /= 1024 * 1024
uss /= 1024 * 1024
pss /= 1024 * 1024
io_in /= 1024 * 1024
io_out /= 1024 * 1024
except psutil.Error as e:
return
# Update benchmark record's RSS and VMS
self.bench_record.max_rss = max(self.bench_record.max_rss or 0, rss)
self.bench_record.max_vms = max(self.bench_record.max_vms or 0, vms)
self.bench_record.max_uss = max(self.bench_record.max_uss or 0, uss)
self.bench_record.max_pss = max(self.bench_record.max_pss or 0, pss)
self.bench_record.io_in = io_in
self.bench_record.io_out = io_out
self.bench_record.cpu_seconds += cpu_seconds
所以这似乎来自 psutil
提供的功能。
当然可以更好地记录 snakemake 中的基准测试,但记录了 psutil here:
get_memory_info()
Return a tuple representing RSS (Resident Set Size) and VMS (Virtual Memory Size) in bytes.
On UNIX RSS and VMS are the same values shown by ps.
On Windows RSS and VMS refer to "Mem Usage" and "VM Size" columns of taskmgr.exe.
psutil.disk_io_counters(perdisk=False)
Return system disk I/O statistics as a namedtuple including the following attributes:
read_count: number of reads
write_count: number of writes
read_bytes: number of bytes read
write_bytes: number of bytes written
read_time: time spent reading from disk (in milliseconds)
write_time: time spent writing to disk (in milliseconds)
您找到的代码确认所有内存使用和 IO 计数均以 MB(= 字节 * 1024 * 1024)为单位报告。
我就把这个留在这里以供将来参考。
通读
snakemake >= 6.0.0
benchmark modulepsutil
的memory_info(), memory_full_info(), io_counters(), cpu_times()
如前所述:
colname | type (unit) | description |
---|---|---|
s | float (seconds) | Running time in seconds |
h:m:s | string (-) | Running time in hour, minutes, seconds format |
max_rss | float (MB) | Maximum "Resident Set Size”, this is the non-swapped physical memory a process has used. |
max_vms | float (MB) | Maximum “Virtual Memory Size”, this is the total amount of virtual memory used by the process |
max_uss | float (MB) | “Unique Set Size”, this is the memory which is unique to a process and which would be freed if the process was terminated right now. |
max_pss | float (MB) | “Proportional Set Size”, is the amount of memory shared with other processes, accounted in a way that the amount is divided evenly between the processes that share it (Linux only) |
io_in | float (MB) | the number of MB read (cumulative). |
io_out | float (MB) | the number of MB written (cumulative). |
mean_load | float (-) | CPU usage over time, divided by the total running time (first row) |
cpu_time | float(-) | CPU time summed for user and system |