映射 MMIO 区域回写不起作用
Mapping MMIO region write-back does not work
我希望 CPU 缓存对 PCIe 设备的所有读写请求进行缓存。然而,它并没有像我预期的那样工作。
这些是我对回写 MMIO 区域的假设。
- 写入 PCIe 设备仅在缓存回写时发生。
- TLP 负载的大小是缓存块大小 (64B)。
但是,捕获的 TLP 不符合我的假设。
- 每次写入 MMIO 区域时都会写入 PCIe 设备。
- TLP 有效负载的大小为 1B。
我使用以下用户 space 程序和设备驱动程序将 0xff
的 8 字节写入 MMIO 区域。
部分用户程序
struct pcie_ioctl ioctl_control;
ioctl_control.bar_select = BAR_ID;
ioctl_control.num_bytes_to_write = atoi(argv[1]);
if (ioctl(fd, IOCTL_WRITE_0xFF, &ioctl_control) < 0) {
printf("ioctl failed\n");
}
部分设备驱动程序
case IOCTL_WRITE_0xFF:
{
int i;
char *buff;
struct pci_cdev_struct *pci_cdev = pci_get_drvdata(fpga_pcie_dev.pci_device);
copy_from_user(&ioctl_control, (void __user *)arg, sizeof(ioctl_control));
buff = kmalloc(sizeof(char) * ioctl_control.num_bytes_to_write, GFP_KERNEL);
for (i = 0; i < ioctl_control.num_bytes_to_write; i++) {
buff[i] = 0xff;
}
memcpy(pci_cdev->bar[ioctl_control.bar_select], buff, ioctl_control.num_bytes_to_write);
kfree(buff);
break;
}
我修改了MTRR,使对应的MMIO区域回写。 MMIO区域从0x0c7300000开始,长度为0x100000(1MB)。以下是不同政策的 cat /proc/mtrr
结果。请注意,我将每个区域设为独占。
不可缓存
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: uncachable
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
写合并
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: write-combining
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
回写
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: write-back
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
以下是不同策略下8B写入的波形图。我使用集成逻辑分析仪 (ILA) 来捕获这些波形。设置pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid
时请注意pcie_endpoint_litepcietlpdepacketizer_tlp_req_payload_dat
。您可以通过计算这些波形示例中的 pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid
来计算数据包的数量。
- 不可缓存:link -> 正确,1B x 8 数据包
- 写入组合:link -> 正确,8B x 1 数据包
- 回写:link -> 意外,1B x 8 数据包
系统配置如下。
- CPU:英特尔(R) 至强(R) CPU E5-2630 v4 @ 2.20GHz
- OS: Linux 内核 4.15.0-38
- PCIe 设备:使用 litepcie
编程的 Xilinx FPGA KC705
相关链接
- How to Implement a 64B PCIe* Burst Transfer on Intel® Architecture
- Write Combining Buffer Out of Order Writes and PCIe
- Do Ryzen support write-back caching for Memory Mapped IO (through PCIe interface)?
- MTRR (Memory Type Range Register) control
- PATting Linux
- Down to the TLP: How PCI express devices talk (Part I)
简而言之,映射 MMIO 区域回写似乎在设计上不起作用。
如果有人认为可行,请上传答案。
我是来寻找 John McCalpin 的文章和答案的。首先,映射 MMIO 区域回写是不可能的。其次,在某些处理器上可以使用解决方法。
映射MMIO区域回写是不可能的
FYI: The WB type will not work with memory-mapped IO. You can
program the bits to set up the mapping as WB, but the system will
crash as soon as it gets a transaction that it does not know how to
handle. It is theoretically possible to use WP or WT to get cached
reads from MMIO, but coherence has to be handled in software.
Only when I set both PAT and MTRR to WB does the kernel crash
在某些处理器上可以使用解决方法
Notes on Cached Access to Memory-Mapped IO Regions, John McCalpin
There is one set of mappings that can be made to work on at least some
x86-64 processors, and it is based on mapping the MMIO space twice.
Map the MMIO range with a set of attributes that allow write-combining
stores (but only uncached reads). Map the MMIO range a second time
with a set of attributes that allow cache-line reads (but only
uncached, non-write-combined stores).
我希望 CPU 缓存对 PCIe 设备的所有读写请求进行缓存。然而,它并没有像我预期的那样工作。
这些是我对回写 MMIO 区域的假设。
- 写入 PCIe 设备仅在缓存回写时发生。
- TLP 负载的大小是缓存块大小 (64B)。
但是,捕获的 TLP 不符合我的假设。
- 每次写入 MMIO 区域时都会写入 PCIe 设备。
- TLP 有效负载的大小为 1B。
我使用以下用户 space 程序和设备驱动程序将 0xff
的 8 字节写入 MMIO 区域。
部分用户程序
struct pcie_ioctl ioctl_control;
ioctl_control.bar_select = BAR_ID;
ioctl_control.num_bytes_to_write = atoi(argv[1]);
if (ioctl(fd, IOCTL_WRITE_0xFF, &ioctl_control) < 0) {
printf("ioctl failed\n");
}
部分设备驱动程序
case IOCTL_WRITE_0xFF:
{
int i;
char *buff;
struct pci_cdev_struct *pci_cdev = pci_get_drvdata(fpga_pcie_dev.pci_device);
copy_from_user(&ioctl_control, (void __user *)arg, sizeof(ioctl_control));
buff = kmalloc(sizeof(char) * ioctl_control.num_bytes_to_write, GFP_KERNEL);
for (i = 0; i < ioctl_control.num_bytes_to_write; i++) {
buff[i] = 0xff;
}
memcpy(pci_cdev->bar[ioctl_control.bar_select], buff, ioctl_control.num_bytes_to_write);
kfree(buff);
break;
}
我修改了MTRR,使对应的MMIO区域回写。 MMIO区域从0x0c7300000开始,长度为0x100000(1MB)。以下是不同政策的 cat /proc/mtrr
结果。请注意,我将每个区域设为独占。
不可缓存
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: uncachable
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
写合并
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: write-combining
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
回写
reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size= 64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size= 32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size= 16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size= 1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size= 1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size= 1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size= 1MB, count=1: write-back
reg09: base=0x0c7400000 ( 3188MB), size= 1MB, count=1: uncachable
以下是不同策略下8B写入的波形图。我使用集成逻辑分析仪 (ILA) 来捕获这些波形。设置pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid
时请注意pcie_endpoint_litepcietlpdepacketizer_tlp_req_payload_dat
。您可以通过计算这些波形示例中的 pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid
来计算数据包的数量。
- 不可缓存:link -> 正确,1B x 8 数据包
- 写入组合:link -> 正确,8B x 1 数据包
- 回写:link -> 意外,1B x 8 数据包
系统配置如下。
- CPU:英特尔(R) 至强(R) CPU E5-2630 v4 @ 2.20GHz
- OS: Linux 内核 4.15.0-38
- PCIe 设备:使用 litepcie 编程的 Xilinx FPGA KC705
相关链接
- How to Implement a 64B PCIe* Burst Transfer on Intel® Architecture
- Write Combining Buffer Out of Order Writes and PCIe
- Do Ryzen support write-back caching for Memory Mapped IO (through PCIe interface)?
- MTRR (Memory Type Range Register) control
- PATting Linux
- Down to the TLP: How PCI express devices talk (Part I)
简而言之,映射 MMIO 区域回写似乎在设计上不起作用。
如果有人认为可行,请上传答案。
我是来寻找 John McCalpin 的文章和答案的。首先,映射 MMIO 区域回写是不可能的。其次,在某些处理器上可以使用解决方法。
映射MMIO区域回写是不可能的
FYI: The WB type will not work with memory-mapped IO. You can program the bits to set up the mapping as WB, but the system will crash as soon as it gets a transaction that it does not know how to handle. It is theoretically possible to use WP or WT to get cached reads from MMIO, but coherence has to be handled in software.
Only when I set both PAT and MTRR to WB does the kernel crash
在某些处理器上可以使用解决方法
Notes on Cached Access to Memory-Mapped IO Regions, John McCalpin
There is one set of mappings that can be made to work on at least some x86-64 processors, and it is based on mapping the MMIO space twice. Map the MMIO range with a set of attributes that allow write-combining stores (but only uncached reads). Map the MMIO range a second time with a set of attributes that allow cache-line reads (but only uncached, non-write-combined stores).