为什么 Linux 在目录上使用 getdents() 而不是 read()？

Question

我正在浏览 K&R C，我注意到要读取目录中的条目，他们使用：

while (read(dp->fd, (char *) &dirbuf, sizeof(dirbuf)) == sizeof(dirbuf))
    /* code */

其中 dirbuf 是系统特定的目录结构，dp->fd 是有效的文件描述符。在我的系统上，dirbuf 应该是 struct linux_dirent。请注意，struct linux_dirent 有一个灵活的数组成员作为条目名称，但为了简单起见，我们假设它没有。（在这种情况下处理灵活的数组成员只需要一点额外的样板代码）。

但是，

Linux 不支持此构造。当使用 read() 尝试读取目录条目时，read() returns -1 和 errno 设置为 EISDIR.

相反，Linux 专门用于读取目录的系统调用，即 getdents() 系统调用。但是，我注意到它的工作方式与上面几乎相同。

while (syscall(SYS_getdents, fd, &dirbuf, sizeof(dirbuf)) != -1)
    /* code */

这背后的道理是什么？与在 K&R 中使用 read() 相比，似乎有 little/no 的好处。

Answer 1

在 K&R 中（实际上，至少从 SVr2 到 Unix，perhaps SVr3），目录条目为 16 字节，inode 使用 2 个字节，文件名使用 14 个字节。

使用read 是有道理的，因为磁盘上的目录条目都是相同大小的。 16 字节（2 的幂）也有意义，因为它不需要硬件乘法来计算偏移量。（我记得有人在 1978 年左右告诉我，Unix 磁盘驱动程序使用浮点数并且速度很慢......但这是二手的，虽然很有趣）。

后来对目录的改进允许使用更长的名称，这意味着大小不同（没有必要使巨大的条目与最大可能的名称完全相同）。提供了更新的界面，readdir.

Linux 提供了一个 较低级别的 接口。根据其manual page:

These are not the interfaces you are interested in. Look at readdir(3) for the POSIX-conforming C library interface. This page documents the bare kernel system call interfaces.

如您的示例所示，getdents 是一个 系统调用 ，对于实现 readdir 很有用。 readdir 的实现方式未指定。早期的 readdir（大约 30 年前）不能作为库函数使用 read 和 malloc 以及类似的函数来管理读取的长文件名，这没有什么特别的原因来自目录。

在这种情况下（可能）将功能移入内核以提高性能。因为 getdents 一次读取多个目录条目（与 readdir 不同），这可能会减少读取小目录的所有条目的开销（通过减少系统调用的次数）。

延伸阅读：

Why are linux file names limited to 256 characters (ie, 8bit)?

Answer 2

getdents 将 return struct linux_dirent。它将为任何底层类型的文件系统执行此操作。 "on disk" 格式可能完全不同，只有给定的文件系统驱动程序知道，因此简单的 userspace 读取调用无法工作。也就是说，getdents 可能会从本机格式转换为填充 linux_dirent.

couldn't the same thing be said about reading bytes from a file with read()? The on disk format of the data within a file isn't necessary uniform across filesystems or even contiguous on disk - thus, reading a series of bytes from disk would again be something I expect to be delegated to the file system driver.

VFS ["virtual filesystem"] 层处理的不连续文件数据。无论 FS 如何选择组织文件的阻止列表（例如 ext4 使用 "inodes"："index" 或 "information" 节点。这些使用 "ISAM"（"index sequential access method") 组织。但是，一个 MS/DOS FS 可以有一个完全不同的组织。

每个 FS 驱动程序在启动时都会注册一个 table 的 VFS 函数回调。对于给定的操作（例如 open/close/read/write/seek），在 table.

中有相应的条目

VFS 层（即来自用户space 系统调用）将 "call down" 进入 FS 驱动程序，然后 FS 驱动程序将执行操作，执行它认为满足请求所需的任何操作。

I assume that the FS driver would know about the location of the data inside a regular file on disk - even if the data was fragmented.

是的。例如，如果读取请求是从文件中读取前三个块（例如 0,1,2），则 FS 将查找文件的索引信息并获取要读取的物理块列表（例如 1000000， 200,37) 从磁盘表面。这一切都在 FS 驱动程序中透明地处理。

用户space程序只会看到它的缓冲区被正确的数据填满，而不考虑 FS 索引和块提取必须有多复杂。

也许将此称为传输 inode 数据 [松散] 更合适，因为文件有 inode（即 inode 具有 "scatter/gather" 文件 FS 块的索引信息）。但是，FS 驱动程序也在内部使用它来读取目录。也就是说，每个目录都有一个 inode 来跟踪该目录的索引信息。

因此，对于 FS 驱动程序，目录很像具有特殊格式信息的平面文件。这些是目录 "entries"。这就是getdentsreturns。这 "sits on top of" inode 索引层。

目录条目可以是可变长度[基于文件名的长度]。因此，磁盘格式为（称之为 "Type A"）：

static part|variable length name
static part|variable length name
...

但是...一些 FS 以不同的方式组织自己（称之为 "Type B"）：

<static1>,<static2>...
<variable1>,<variable2>,...

因此，A 类组织可能被用户 原子地 读取 space read(2) 调用， B型会有困难。所以，getdents VFS 调用处理了这个。

couldn't the VFS also present a "linux_dirent" view of a directory like the VFS presents a "flat view" of a file?

这就是 getdents 的目的。

Then again, I'm assuming that a FS driver knows the type of each file and thus could return a linux_dirent when read() is called on a directory rather than a series of bytes.

getdents不是一直存在。当 dirents 是固定大小并且只有 one FS 格式时，readdir(3) 调用可能会在下面执行 read(2) 并获得一系列字节 [即仅 read(2) 提供的内容]。实际上，IIRC，一开始只有 readdir(2) 和 getdents 而 readdir(3) 不存在。

但是，如果 read(2) 是 "short"（例如，两个字节太小），您会怎么做？您如何将其传达给应用程序？

My question is more like since the FS driver can determine whether a file is a directory or a regular file (and I'm assuming it can), and since it has to intercept all read() calls eventually, why isn't read() on a directory implemented as reading the linux_dirent?

目录上的

read 不会被拦截并转换为 getdents 因为 OS 是极简主义的。它希望您知道其中的区别并进行适当的系统调用。

您对文件或目录执行 open(2) [opendir(3) 是包装器并在下面执行 open(2)]。您可以 read/write/seek 文件和 seek/getdents 目录。

但是...做 read return 秒 EISDIR。 [旁注：我在原来的评论中忘记了这一点]。在它提供的简单 "flat data" 模型中，没有办法 convey/control 所有 getdents can/does.

因此，与其允许使用劣质方法获取 partial/wrong 信息，内核和应用程序开发人员更容易通过 getdents界面。

此外，getdents 以原子方式 做事。如果您正在读取给定程序中的目录条目，则可能有其他程序正在该目录中创建和删除文件或重命名它们——就在您的 getdents 序列的中间。

getdents 将呈现一个 atomic 视图。文件存在或不存在。它已重命名或尚未重命名。因此，无论您周围发生了多少 "turmoil"，您都看不到 "half modified" 视图。当您向 getdents 询问 20 个条目时，您会得到它们 [或者 10 个，如果只有那么多]。

旁注： 一个有用的技巧是 "overspecify" 计数。也就是说，告诉 getdents 您想要 50,000 个条目 [您必须提供 space]。您通常会得到 100 左右的回报。但是，现在，您得到的是完整目录的 atomic 及时快照。我有时这样做而不是循环计数 1--YMMV。您仍然必须防止立即消失，但至少您可以看到它（即后续文件打开失败）

因此，对于只是删除的文件，您总是会得到 "whole" 条目和没有条目。那就是说不是文件仍然存在，只是说它 在 getdents 时 在那里。另一个进程可能会立即擦除它，但不会在 getdents

中间

如果 read(2) 被允许 ，您将不得不猜测要读取多少数据，并且不知道哪些条目在部分中完全形成状态。如果 FS 具有上述 B 型组织，则单个读取可以 not 原子地在单个步骤中获取静态部分和可变部分。

放慢 read(2) 去做 getdents 做的事情在哲学上是不正确的。

getdents、unlink、creat、rmdir、rename等操作环环相扣，序列化 以防止任何不一致 [更不用说 FS 损坏或 leaked/lost FS 块]。换句话说，这些系统调用都是 "know about each other".

如果 pgmA 将 "x" 重命名为 "z"，而 pgmB 将 "y" 重命名为 "z"，它们不会发生冲突。一个先走，另一个走，但从来没有 FS 块 lost/leaked。 getdents 获取整个视图（无论是 "x y"、"y z"、"x z" 还是 "z"），但它永远不会同时看到 "x y z"。

Answer 3

您的怀疑是正确的：让 read 系统调用在目录和 return 一些标准化数据上工作比使用单独的 getdents 系统调用更有意义。 getdents 是多余的，会降低界面的统一性。其他答案断言 "read" 作为接口在某些方面不如 "getdents"。他们是不正确的。如您所见，"read" 和 "getdents" 的参数和 return 值是相同的； just "read" 仅适用于非目录，而 "getdents" 仅适用于目录。 "getdents" 可以很容易地折叠成 "read" 以获得单个统一的系统调用。

并非如此的原因是历史原因。最初，"read" 处理目录，但 returned 文件系统中的实际原始目录条目。这解析起来很复杂，因此 添加了 getdirents 调用 来读取，以提供独立于文件系统的目录条目视图。最终，目录上的 "read" 被关闭。目录上的 "read" 也可以与 getdirents 的行为相同而不是被关闭。它只是不是，可能是因为它看起来重复。

在 Linux 中，特别是 "read" 在长时间读取目录时出现错误，几乎可以肯定某些程序依赖于此行为。因此，向后兼容性要求 Linux 上的 "read" 永远不会在目录上工作。

为什么 Linux 在目录上使用 getdents() 而不是 read()？

Why does Linux use getdents() on directories instead of read()?

c

unix

architecture

linux

filesystem-access