Python 的 NVMe 吞吐量测试

Question

目前我需要做一些吞吐量测试。我的硬件设置是我有一个三星 950 Pro 连接到一个 NVMe 控制器，该控制器通过 PCIe 端口连接到主板。我有一个 Linux nvme 设备，对应于我安装在文件系统某个位置的设备。

我希望使用 Python 来做到这一点。我打算在安装 SSD 的文件系统上打开一个文件，记录时间，将一些 n 长度的字节流写入文件，记录时间，然后使用 [=32 closing 文件=] 模块文件操作实用程序。这是衡量写入吞吐量的函数。

def perform_timed_write(num_bytes, blocksize, fd):
    """
    This function writes to file and records the time

    The function has three steps. The first is to write, the second is to
    record time, and the third is to calculate the rate.

    Parameters
    ----------
    num_bytes: int
        blocksize that needs to be written to the file
    fd: string
        location on filesystem to write to

    Returns
    -------
    bytes_per_second: float
        rate of transfer
    """
    # generate random string
    random_byte_string = os.urandom(blocksize)

    # open the file
    write_file = os.open(fd, os.O_CREAT | os.O_WRONLY | os.O_NONBLOCK)        
    # set time, write, record time
    bytes_written = 0
    before_write = time.clock()
    while bytes_written < num_bytes:
        os.write(write_file, random_byte_string)
        bytes_written += blocksize
    after_write = time.clock()

    #close the file
    os.close(write_file)

    # calculate elapsed time
    elapsed_time = after_write - before_write

    # calculate bytes per second
    bytes_per_second = num_bytes / elapsed_time


    return bytes_per_second

我的另一种测试方法是使用 Linux fio 实用程序。 https://linux.die.net/man/1/fio

在 /fsmnt/fs1 安装 SSD 后，我使用这个作业文件来测试吞吐量

;Write to 1 file on partition
[global]
ioengine=libaio
buffered=0
rw=write
bs=4k
size=1g
openfiles=1

[file1]
directory=/fsmnt/fs1

我注意到Python函数返回的写入速度明显高于fio。因为 Python 太高级了，所以你放弃了很多控制权。我想知道 Python 是否正在做一些事情来欺骗它的速度更快。有谁知道为什么 Python 会产生比 fio 产生的 ose 高得多的写入速度？

Answer 1

你的 Python 程序比你的 fio 工作更好的原因是因为这不是一个公平的比较，他们正在测试不同的东西：

你禁止 fio 使用 Linux 的缓冲区缓存（通过使用 buffered=0，这与说 direct=1 相同）告诉它做 O_DIRECT操作。对于您指定的作业，fio 将必须发送一个 4k 写入，然后等待该写入在设备 上完成 （并且该确认必须一直返回到 fio ) 才能发送下一个。
您的 Python 脚本被允许发送可以在多个级别缓冲的写入（例如，在 C 库的用户空间内，然后再次在内核的缓冲区缓存中）之前触摸您的 SSD。这通常意味着写入将被累积并合并在一起，然后再被发送到较低级别，从而导致开销更小的块状 I/Os。此外，由于理论上您没有进行任何显式刷新，因此在程序退出之前不必将 I/O 发送到磁盘（实际上这将取决于许多因素，例如您 I/O 多少做，可以为缓冲区预留的 RAM 量 Linux，文件系统将保存脏数据的最长时间，你为等等做 I/O 的时间）！您的 os.close(write_file) 将变成 fclose() which says this in its Linux man page:

Note that fclose() flushes only the user-space buffers provided by the C library. To ensure that the data is physically stored on disk the kernel buffers must be flushed too, for example, with sync(2) or fsync(2).

事实上，您在调用 os.close() 之前花费了最后时间，因此您甚至可能忽略了将最后 "batches" 数据仅发送到内核所花费的时间，更不用说固态硬盘！

您的 Python 脚本更接近这个 fio 作业：

[global]
ioengine=psync
rw=write
bs=4k
size=1g

[file1]
filename=/fsmnt/fio.tmp

即使有了这个 fio 仍然处于劣势，因为你的 Python 程序有用户空间缓冲（所以 bs=8k 可能更接近）。

关键是你的 Python 程序并没有真正测试你的 SSD 在你指定的块大小下的速度，你原来的 fio 作业有点奇怪，受到严格限制（libaio ioengine 是异步的但是深度为 1 时，您将无法从中受益，那是在我们到达 behaviour of Linux AIO when using filesystems) 之前，并且对您的 Python 程序做不同的事情。如果与最大缓冲区的大小相比，你没有做更多的缓冲 I/O（并且在 Linux 上内核的缓冲区大小随 RAM 缩放）并且如果缓冲 I/Os 很小运动变成缓冲效果的示范。

Answer 2

如果您需要 NVMe 设备的精确性能，fio 是最佳选择。 FIO可以直接向设备写入测试数据，不需要任何文件系统。这是一个例子：

[global]
ioengine=libaio
invalidate=1
iodepth=32
time_based
direct=1
filename=/dev/nvme0n1

[write-nvme]
stonewall
bs=128K
rw=write
numjobs=1
runtime=10000

SPDK 是另一种选择。 https://github.com/spdk/spdk/tree/master/examples/nvme/perf 已有性能测试示例。

Pynvme是基于SPDK的Python扩展。您可以使用它的 ioworker() 编写性能测试。

Python 的 NVMe 吞吐量测试

NVMe Throughput Testing with Python

python

linux

file-io

nvme