最常见的总线主控操作是什么?它们比常规 DMA 有何优势?
What are the most common busmaster operations, and how are they better than regular DMA?
有人可以列出使用主机总线的总线控制功能的最常见操作吗?我可以列举几个..
1) GPU 使用基于 PCI-e 的总线控制(在最近的 x86 中)将整个帧缓冲区传输到视频卡。
2) 以太网卡使用总线主控将接收到的数据包传输到主内存。
3) 我假设硬盘也使用总线主控来传输块。
在这种情况下,这些 devices/drives 何时使用总线控制,而不是第 3 方 DMA?
最近,linux 内核似乎开始支持 PCIe 中称为 P2P DMA 的东西,设备之间直接进行通信。现在,P2P DMA 与常规总线主控 DMA 有何根本不同。我想,直到现在,设备只使用总线主控来传输到 DMA 子系统创建的缓冲区,而且它总是进出主内存,对吧?我猜 P2P Dma 是一项允许完全绕过主内存的规定。我还在某处读到这样的规定被高端游戏系统中的一些专有图形驱动程序使用,并且 Linux 有点晚了。
有人可以提供现代系统中可用的各种 DMA 的广泛概述,以及一些从概念上理解它们的方法吗?
编辑:常规 DMA 更改为第 3 方 DMA
TLDR:在现代树状高速总线中,如 PCI-express almost every device connected to it is capable of initiating memory access transactions (first-party DMA) for read and write from system memory. They are similar to bus master operations 的古代共享总线,如 PCI 或 ISA。
Can someone list the most common operations that use the bus mastering provision of the host bus?
在具有高速外设接口 ("bus") 的现代系统中,如 PCIe(树,而不是总线),最常见的操作是所谓的 "Memory Write" 和 "Memory Read" 操作,编码在 TLP 数据包中。 (那里的一些例子:http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1)
在多个设备连接到电气共享信号的古老公共总线(ISA、传统 PCI)中,这可能有所不同。但是基本的驱动程序编写说明将起作用,例如来自 the LDD book, the Linux Device Drivers: https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch13s04.html
1) The GPU transfers the overall framebuffer to the video card using bus-mastering over PCI-e (in recent x86).
最近videocards and GPUs with PCIe interface (native PCIe or PCIe made with integrated PCI to PCIe, or AGP to PCIe bridge chip) framebuffer is integrated into the GPU itself (https://en.wikipedia.org/wiki/Framebuffer "Modern video cards contain framebuffer circuitry in their cores."). GPU have no need to access bus to output the frame, GPU is the video card. (GPU has circuits to convert current frame image from internal GPU memory to the video signal: RAMDAC为VGA或DVI-A/-I模拟信号; RAM 到 DVI-D、HDMI 和 DisplayPort 的 TMDS 或 LVDS 编码器。)
2) The ethernet card transfers a received packet to main-memory using bus-mastering.
是的,以太网卡中的 Ethernet controller (NIC) 将使用一些描述符和环将接收到的数据包写入系统内存,这些描述符和环由 OS 中的网络驱动程序设置。它还会执行由驱动程序编写的传输描述符,并且为了传输数据包,它会从系统内存中读取数据包头和数据包数据。两种操作都是 DMA,操作的 DMA 引擎在网络控制器中(对于现代 PCIe 和 PCI 卡)。
3) I assume the hard-disk too uses bus-mastering to transfer blocks.
硬盘未连接到任何具有总线控制(真实或仿真)的总线。通常硬盘有PATA, SATA, 或SCSI 接口连接到一些硬盘disk controller or HBA,像PCI-to-SATA, PCIe-to-SATA 一些集成在南桥或其他类型的控制器。控制器将使用 DMA(和总线主控)在环外执行读写描述符。写描述符的数据将由 DMA 从系统内存中读取;从磁盘读取的数据将通过 DMA 写入系统内存。操作的 DMA 引擎在磁盘控制器中。
NVMe (NVM express) Solid state disk (SSD) are directly connected to the PCIe bus and nvme controller in every such disk is capable of doing DMA reads and writes, programmed with nvme descriptors in nvme queues by nvme driver. (Overview of nvme:队列第 7 页,I/O 命令第 10 页 SQ;第 21 页 - 控制器寄存器 "doorbell" 由驱动程序写入以通知 nvme 控制器新命令已发布到队列,第 45 页用于读取 io 命令,第 73 页用于写入命令)。
In this context, when do these devices/drives use bus-mastering, vs 3rd party DMA?
在现代 PCIe 系统的上下文中,每个连接到总线的设备都只直接连接到单个端口,高速 full duplex channel (directly to host PCIe port or to some bridge or switch). There is no electrical signal sharing like in old PCI, and there is no need to call any operation as "bus mastering". When device wants to read or write system (or any other) memory device can send Memory read or write 数据包 (TLP) 到接口。做一些有用的设备有自己的集成 DMA 引擎,驱动程序将命令设备控制器对正确的内存地址执行操作(由驱动程序分配并在命令描述符中给出)。
bus-mastering was only used by the device to transfer to the buffer created by the DMA subsystem and it was always to or from the main-memory, right?
是的,通常驱动程序使用主(系统)内存作为 DMA 操作的目标。
P2P Dma is a provision that allows one to bypass the main memory altogether
P2P DMA 是一项非常新且罕见的功能。它可以与少量高成本的特殊高速 PCI-express 控制器一起使用,但可能不适用于典型的桌面系统。
在 P2P(点对点)dma 中,特殊驱动程序可能会将一个设备指向同一 PCI-express 层次结构上另一个(兼容)设备的内存资源进行 DMA 操作。此 p2p DMA 可用于发送大数据。系统内存可用于存储一些元数据或描述符,因此大约 95% 或 99% 的数据通过 P2P dma 发送,而 5% 或 1% 的元数据仍存储在系统内存中。
例如,有人提出可以将数据从一个 nvme ssd 复制到另一个或使用 p2p dma 计算 GPU 的驱动程序:
* https://www.snia.org/sites/default/files/SDC15_presentations/nvme_fab/StephenBates_Donard_NVM_Express_Peer-2_Peer.pdf
* https://www.usenix.org/sites/default/files/conference/protected-files/atc17_slides_bergman.pdf
* https://lwn.net/Articles/764716/(内核中的对等 PCI 内存,初始支持 NVMe 结构目标子系统)
关于 CMB 和 PMR 缓冲区的一些信息:
* https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_SOFT-201-1_Bates.pdf
* https://nvmexpress.org/wp-content/uploads/Session-2-Enabling-the-NVMe-CMB-and-PMR-Ecosystem-Eideticom-and-Mell....pdf
PCIe 交换机(据我了解)将配置为检测 p2p 内存读取或写入操作 (TLP) 的接收方,并将 TLP 数据包直接发送到正确的设备,而不是主机端口。内存访问 R/W (DMA) TLP 可能是 routed by the access address ("Table 3-5. PCI Express TLP Variants And Routing Options").
一般 p2p dma 描述和 p2p dma 激活的一些问题:https://spdk.io/doc/peer_2_peer.html
Peer-2-Peer (P2P) is the concept of DMAing data directly from one PCI End Point (EP) to another without using a system memory buffer. The most obvious example of this from an SPDK perspective is using a NVMe Controller Memory Buffer (CMB) to enable direct copies of data between two NVMe SSDs.
In some systems when performing peer-2-peer DMAs between PCIe EPs that are directly connected to the Root Complex (RC) the DMA may fail or the performance may not be great. Basically your milage may vary. It is recommended that you use a PCIe switch (such as those provided by Broadcom or Microsemi) as that is know to provide good performance.
一些 linux 驱动程序中 p2p dma 支持的内核文档:
https://www.kernel.org/doc/html/latest/driver-api/pci/p2pdma.html
The PCI bus has pretty decent support for performing DMA transfers between two devices on the bus. This type of transaction is henceforth called Peer-to-Peer (or P2P). However, there are a number of issues that make P2P transactions tricky to do in a perfectly safe way.
Therefore, as of this writing, the kernel only supports doing P2P when the endpoints involved are all behind the same PCI bridge, as such devices are all in the same PCI hierarchy domain, and the spec guarantees that all transactions within the hierarchy will be routable
有人可以列出使用主机总线的总线控制功能的最常见操作吗?我可以列举几个..
1) GPU 使用基于 PCI-e 的总线控制(在最近的 x86 中)将整个帧缓冲区传输到视频卡。
2) 以太网卡使用总线主控将接收到的数据包传输到主内存。
3) 我假设硬盘也使用总线主控来传输块。
在这种情况下,这些 devices/drives 何时使用总线控制,而不是第 3 方 DMA?
最近,linux 内核似乎开始支持 PCIe 中称为 P2P DMA 的东西,设备之间直接进行通信。现在,P2P DMA 与常规总线主控 DMA 有何根本不同。我想,直到现在,设备只使用总线主控来传输到 DMA 子系统创建的缓冲区,而且它总是进出主内存,对吧?我猜 P2P Dma 是一项允许完全绕过主内存的规定。我还在某处读到这样的规定被高端游戏系统中的一些专有图形驱动程序使用,并且 Linux 有点晚了。
有人可以提供现代系统中可用的各种 DMA 的广泛概述,以及一些从概念上理解它们的方法吗?
编辑:常规 DMA 更改为第 3 方 DMA
TLDR:在现代树状高速总线中,如 PCI-express almost every device connected to it is capable of initiating memory access transactions (first-party DMA) for read and write from system memory. They are similar to bus master operations 的古代共享总线,如 PCI 或 ISA。
Can someone list the most common operations that use the bus mastering provision of the host bus?
在具有高速外设接口 ("bus") 的现代系统中,如 PCIe(树,而不是总线),最常见的操作是所谓的 "Memory Write" 和 "Memory Read" 操作,编码在 TLP 数据包中。 (那里的一些例子:http://xillybus.com/tutorials/pci-express-tlp-pcie-primer-tutorial-guide-1)
在多个设备连接到电气共享信号的古老公共总线(ISA、传统 PCI)中,这可能有所不同。但是基本的驱动程序编写说明将起作用,例如来自 the LDD book, the Linux Device Drivers: https://www.oreilly.com/library/view/linux-device-drivers/0596000081/ch13s04.html
1) The GPU transfers the overall framebuffer to the video card using bus-mastering over PCI-e (in recent x86).
最近videocards and GPUs with PCIe interface (native PCIe or PCIe made with integrated PCI to PCIe, or AGP to PCIe bridge chip) framebuffer is integrated into the GPU itself (https://en.wikipedia.org/wiki/Framebuffer "Modern video cards contain framebuffer circuitry in their cores."). GPU have no need to access bus to output the frame, GPU is the video card. (GPU has circuits to convert current frame image from internal GPU memory to the video signal: RAMDAC为VGA或DVI-A/-I模拟信号; RAM 到 DVI-D、HDMI 和 DisplayPort 的 TMDS 或 LVDS 编码器。)
2) The ethernet card transfers a received packet to main-memory using bus-mastering.
是的,以太网卡中的 Ethernet controller (NIC) 将使用一些描述符和环将接收到的数据包写入系统内存,这些描述符和环由 OS 中的网络驱动程序设置。它还会执行由驱动程序编写的传输描述符,并且为了传输数据包,它会从系统内存中读取数据包头和数据包数据。两种操作都是 DMA,操作的 DMA 引擎在网络控制器中(对于现代 PCIe 和 PCI 卡)。
3) I assume the hard-disk too uses bus-mastering to transfer blocks.
硬盘未连接到任何具有总线控制(真实或仿真)的总线。通常硬盘有PATA, SATA, 或SCSI 接口连接到一些硬盘disk controller or HBA,像PCI-to-SATA, PCIe-to-SATA 一些集成在南桥或其他类型的控制器。控制器将使用 DMA(和总线主控)在环外执行读写描述符。写描述符的数据将由 DMA 从系统内存中读取;从磁盘读取的数据将通过 DMA 写入系统内存。操作的 DMA 引擎在磁盘控制器中。
NVMe (NVM express) Solid state disk (SSD) are directly connected to the PCIe bus and nvme controller in every such disk is capable of doing DMA reads and writes, programmed with nvme descriptors in nvme queues by nvme driver. (Overview of nvme:队列第 7 页,I/O 命令第 10 页 SQ;第 21 页 - 控制器寄存器 "doorbell" 由驱动程序写入以通知 nvme 控制器新命令已发布到队列,第 45 页用于读取 io 命令,第 73 页用于写入命令)。
In this context, when do these devices/drives use bus-mastering, vs 3rd party DMA?
在现代 PCIe 系统的上下文中,每个连接到总线的设备都只直接连接到单个端口,高速 full duplex channel (directly to host PCIe port or to some bridge or switch). There is no electrical signal sharing like in old PCI, and there is no need to call any operation as "bus mastering". When device wants to read or write system (or any other) memory device can send Memory read or write 数据包 (TLP) 到接口。做一些有用的设备有自己的集成 DMA 引擎,驱动程序将命令设备控制器对正确的内存地址执行操作(由驱动程序分配并在命令描述符中给出)。
bus-mastering was only used by the device to transfer to the buffer created by the DMA subsystem and it was always to or from the main-memory, right?
是的,通常驱动程序使用主(系统)内存作为 DMA 操作的目标。
P2P Dma is a provision that allows one to bypass the main memory altogether
P2P DMA 是一项非常新且罕见的功能。它可以与少量高成本的特殊高速 PCI-express 控制器一起使用,但可能不适用于典型的桌面系统。
在 P2P(点对点)dma 中,特殊驱动程序可能会将一个设备指向同一 PCI-express 层次结构上另一个(兼容)设备的内存资源进行 DMA 操作。此 p2p DMA 可用于发送大数据。系统内存可用于存储一些元数据或描述符,因此大约 95% 或 99% 的数据通过 P2P dma 发送,而 5% 或 1% 的元数据仍存储在系统内存中。
例如,有人提出可以将数据从一个 nvme ssd 复制到另一个或使用 p2p dma 计算 GPU 的驱动程序: * https://www.snia.org/sites/default/files/SDC15_presentations/nvme_fab/StephenBates_Donard_NVM_Express_Peer-2_Peer.pdf * https://www.usenix.org/sites/default/files/conference/protected-files/atc17_slides_bergman.pdf * https://lwn.net/Articles/764716/(内核中的对等 PCI 内存,初始支持 NVMe 结构目标子系统)
关于 CMB 和 PMR 缓冲区的一些信息: * https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_SOFT-201-1_Bates.pdf * https://nvmexpress.org/wp-content/uploads/Session-2-Enabling-the-NVMe-CMB-and-PMR-Ecosystem-Eideticom-and-Mell....pdf
PCIe 交换机(据我了解)将配置为检测 p2p 内存读取或写入操作 (TLP) 的接收方,并将 TLP 数据包直接发送到正确的设备,而不是主机端口。内存访问 R/W (DMA) TLP 可能是 routed by the access address ("Table 3-5. PCI Express TLP Variants And Routing Options").
一般 p2p dma 描述和 p2p dma 激活的一些问题:https://spdk.io/doc/peer_2_peer.html
Peer-2-Peer (P2P) is the concept of DMAing data directly from one PCI End Point (EP) to another without using a system memory buffer. The most obvious example of this from an SPDK perspective is using a NVMe Controller Memory Buffer (CMB) to enable direct copies of data between two NVMe SSDs. In some systems when performing peer-2-peer DMAs between PCIe EPs that are directly connected to the Root Complex (RC) the DMA may fail or the performance may not be great. Basically your milage may vary. It is recommended that you use a PCIe switch (such as those provided by Broadcom or Microsemi) as that is know to provide good performance.
一些 linux 驱动程序中 p2p dma 支持的内核文档: https://www.kernel.org/doc/html/latest/driver-api/pci/p2pdma.html
The PCI bus has pretty decent support for performing DMA transfers between two devices on the bus. This type of transaction is henceforth called Peer-to-Peer (or P2P). However, there are a number of issues that make P2P transactions tricky to do in a perfectly safe way. Therefore, as of this writing, the kernel only supports doing P2P when the endpoints involved are all behind the same PCI bridge, as such devices are all in the same PCI hierarchy domain, and the spec guarantees that all transactions within the hierarchy will be routable