确定 Linux 系统中 "logical" 字节 read/written 的数量
Determine the number of "logical" bytes read/written in a Linux system
我想通过read()
和write()
等系统调用逻辑read/written确定所有进程的字节数。这不同于实际从存储层获取的字节数(由 iotop 等工具显示),因为它包括(例如)命中页面缓存的读取,并且在识别写入时也有所不同:逻辑写入 IO 立即发生当发出 write
调用时,实际的物理 IO 可能会在一段时间后发生,具体取决于各种因素(Linux 通常缓冲写入并在一段时间后执行物理 IO)。
我知道如何在每个进程的基础上执行此操作(例如,参见 this question),但不知道如何获取系统范围的计数。
这是一个跟踪逻辑 IO 的 SystemTap 脚本。它基于 https://sourceware.org/systemtap/SystemTap_Beginners_Guide/traceiosect.html
处的脚本
#! /usr/bin/env stap
# traceio.stp
# Copyright (C) 2007 Red Hat, Inc., Eugene Teo <eteo@redhat.com>
# Copyright (C) 2009 Kai Meyer <kai@unixlords.com>
# Fixed a bug that allows this to run longer
# And added the humanreadable function
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
#
global reads, writes
probe vfs.read.return {
if ($return > 0) {
reads += $return
}
}
probe vfs.write.return {
if ($return > 0) {
writes += $return
}
}
function humanreadable(bytes) {
if (bytes > 1024*1024*1024) {
return sprintf("%d GiB", bytes/1024/1024/1024)
} else if (bytes > 1024*1024) {
return sprintf("%d MiB", bytes/1024/1024)
} else if (bytes > 1024) {
return sprintf("%d KiB", bytes/1024)
} else {
return sprintf("%d B", bytes)
}
}
probe timer.s(1) {
printf("reads: %12s writes: %12s\n", humanreadable(reads), humanreadable(writes))
# Note we don't zero out reads and writes,
# so the values are cumulative since the script started.
}
如果您想使用 /proc
文件系统来计算总计数(而不是每秒计数),这很容易。
这也适用于相当旧的内核(在 Debian Squeeze 2.6.32 内核上测试)。
# cat /proc/1979/io
rchar: 111195372883082
wchar: 10424431162257
syscr: 130902776102
syscw: 6236420365
read_bytes: 2839822376960
write_bytes: 803408183296
cancelled_write_bytes: 374812672
对于系统范围,只需对所有进程的数字求和,但这仅在短期内足够好,因为随着进程死亡,它们的统计信息将从内存中删除。您需要启用流程记帐才能保存它们。
内核源文件中记录了这些文件的含义Documentation/filesystems/proc.txt
:
rchar - I/O counter: chars read
The number of bytes which this task has caused
to be read from storage. This is simply the sum of bytes which this
process passed to read() and pread(). It includes things like tty IO
and it is unaffected by whether or not actual physical disk IO was
required (the read might have been satisfied from pagecache)
wchar - I/O counter: chars written
The number of bytes which this task has
caused, or shall cause to be written to disk. Similar caveats apply
here as with rchar.
syscr - I/O counter: read syscalls
Attempt to count the number of read I/O
operations, i.e. syscalls like read() and pread().
syscw - I/O counter: write syscalls
Attempt to count the number of write I/O
operations, i.e. syscalls like write() and pwrite().
read_bytes - I/O counter: bytes read
Attempt to count the number of bytes which
this process really did cause to be fetched from the storage layer.
Done at the submit_bio() level, so it is accurate for block-backed
filesystems.
write_bytes - I/O counter: bytes written
Attempt to count the number of bytes which
this process caused to be sent to the storage layer. This is done at
page-dirtying time.
cancelled_write_bytes
The big inaccuracy here is truncate. If a process writes 1MB to a file
and then deletes the file, it will in fact perform no writeout. But it
will have been accounted as having caused 1MB of write. In other
words: The number of bytes which this process caused to not happen, by
truncating pagecache. A task can cause "negative" IO too.
我想通过read()
和write()
等系统调用逻辑read/written确定所有进程的字节数。这不同于实际从存储层获取的字节数(由 iotop 等工具显示),因为它包括(例如)命中页面缓存的读取,并且在识别写入时也有所不同:逻辑写入 IO 立即发生当发出 write
调用时,实际的物理 IO 可能会在一段时间后发生,具体取决于各种因素(Linux 通常缓冲写入并在一段时间后执行物理 IO)。
我知道如何在每个进程的基础上执行此操作(例如,参见 this question),但不知道如何获取系统范围的计数。
这是一个跟踪逻辑 IO 的 SystemTap 脚本。它基于 https://sourceware.org/systemtap/SystemTap_Beginners_Guide/traceiosect.html
处的脚本#! /usr/bin/env stap
# traceio.stp
# Copyright (C) 2007 Red Hat, Inc., Eugene Teo <eteo@redhat.com>
# Copyright (C) 2009 Kai Meyer <kai@unixlords.com>
# Fixed a bug that allows this to run longer
# And added the humanreadable function
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License version 2 as
# published by the Free Software Foundation.
#
global reads, writes
probe vfs.read.return {
if ($return > 0) {
reads += $return
}
}
probe vfs.write.return {
if ($return > 0) {
writes += $return
}
}
function humanreadable(bytes) {
if (bytes > 1024*1024*1024) {
return sprintf("%d GiB", bytes/1024/1024/1024)
} else if (bytes > 1024*1024) {
return sprintf("%d MiB", bytes/1024/1024)
} else if (bytes > 1024) {
return sprintf("%d KiB", bytes/1024)
} else {
return sprintf("%d B", bytes)
}
}
probe timer.s(1) {
printf("reads: %12s writes: %12s\n", humanreadable(reads), humanreadable(writes))
# Note we don't zero out reads and writes,
# so the values are cumulative since the script started.
}
如果您想使用 /proc
文件系统来计算总计数(而不是每秒计数),这很容易。
这也适用于相当旧的内核(在 Debian Squeeze 2.6.32 内核上测试)。
# cat /proc/1979/io
rchar: 111195372883082
wchar: 10424431162257
syscr: 130902776102
syscw: 6236420365
read_bytes: 2839822376960
write_bytes: 803408183296
cancelled_write_bytes: 374812672
对于系统范围,只需对所有进程的数字求和,但这仅在短期内足够好,因为随着进程死亡,它们的统计信息将从内存中删除。您需要启用流程记帐才能保存它们。
内核源文件中记录了这些文件的含义Documentation/filesystems/proc.txt
:
rchar - I/O counter: chars read
The number of bytes which this task has caused to be read from storage. This is simply the sum of bytes which this process passed to read() and pread(). It includes things like tty IO and it is unaffected by whether or not actual physical disk IO was required (the read might have been satisfied from pagecache)
wchar - I/O counter: chars written
The number of bytes which this task has caused, or shall cause to be written to disk. Similar caveats apply here as with rchar.
syscr - I/O counter: read syscalls
Attempt to count the number of read I/O operations, i.e. syscalls like read() and pread().
syscw - I/O counter: write syscalls
Attempt to count the number of write I/O operations, i.e. syscalls like write() and pwrite().
read_bytes - I/O counter: bytes read
Attempt to count the number of bytes which this process really did cause to be fetched from the storage layer. Done at the submit_bio() level, so it is accurate for block-backed filesystems.
write_bytes - I/O counter: bytes written
Attempt to count the number of bytes which this process caused to be sent to the storage layer. This is done at page-dirtying time.
cancelled_write_bytes
The big inaccuracy here is truncate. If a process writes 1MB to a file and then deletes the file, it will in fact perform no writeout. But it will have been accounted as having caused 1MB of write. In other words: The number of bytes which this process caused to not happen, by truncating pagecache. A task can cause "negative" IO too.