映射 MMIO 区域回写不起作用

Question

我希望 CPU 缓存对 PCIe 设备的所有读写请求进行缓存。然而，它并没有像我预期的那样工作。

这些是我对回写 MMIO 区域的假设。

写入 PCIe 设备仅在缓存回写时发生。
TLP 负载的大小是缓存块大小 (64B)。

但是，捕获的 TLP 不符合我的假设。

每次写入 MMIO 区域时都会写入 PCIe 设备。
TLP 有效负载的大小为 1B。

我使用以下用户 space 程序和设备驱动程序将 0xff 的 8 字节写入 MMIO 区域。

部分用户程序

struct pcie_ioctl ioctl_control;
ioctl_control.bar_select = BAR_ID;
ioctl_control.num_bytes_to_write = atoi(argv[1]);
if (ioctl(fd, IOCTL_WRITE_0xFF, &ioctl_control) < 0) {
    printf("ioctl failed\n");
}

部分设备驱动程序

case IOCTL_WRITE_0xFF:
{
    int i;
    char *buff;
    struct pci_cdev_struct *pci_cdev = pci_get_drvdata(fpga_pcie_dev.pci_device);
    copy_from_user(&ioctl_control, (void __user *)arg, sizeof(ioctl_control));
    buff = kmalloc(sizeof(char) * ioctl_control.num_bytes_to_write, GFP_KERNEL);
    for (i = 0; i < ioctl_control.num_bytes_to_write; i++) {
        buff[i] = 0xff;
    }
    memcpy(pci_cdev->bar[ioctl_control.bar_select], buff, ioctl_control.num_bytes_to_write);
    kfree(buff);
    break;
}

我修改了MTRR，使对应的MMIO区域回写。 MMIO区域从0x0c7300000开始，长度为0x100000（1MB）。以下是不同政策的 cat /proc/mtrr 结果。请注意，我将每个区域设为独占。

不可缓存

reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size=   64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size=   32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size=   16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size=    1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size=    1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size=    1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size=    1MB, count=1: uncachable
reg09: base=0x0c7400000 ( 3188MB), size=    1MB, count=1: uncachable

写合并

reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size=   64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size=   32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size=   16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size=    1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size=    1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size=    1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size=    1MB, count=1: write-combining
reg09: base=0x0c7400000 ( 3188MB), size=    1MB, count=1: uncachable

回写

reg00: base=0x080000000 ( 2048MB), size= 1024MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=524288MB, count=1: uncachable
reg02: base=0x0c0000000 ( 3072MB), size=   64MB, count=1: uncachable
reg03: base=0x0c4000000 ( 3136MB), size=   32MB, count=1: uncachable
reg04: base=0x0c6000000 ( 3168MB), size=   16MB, count=1: uncachable
reg05: base=0x0c7000000 ( 3184MB), size=    1MB, count=1: uncachable
reg06: base=0x0c7100000 ( 3185MB), size=    1MB, count=1: uncachable
reg07: base=0x0c7200000 ( 3186MB), size=    1MB, count=1: uncachable
reg08: base=0x0c7300000 ( 3187MB), size=    1MB, count=1: write-back
reg09: base=0x0c7400000 ( 3188MB), size=    1MB, count=1: uncachable

以下是不同策略下8B写入的波形图。我使用集成逻辑分析仪 (ILA) 来捕获这些波形。设置pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid时请注意pcie_endpoint_litepcietlpdepacketizer_tlp_req_payload_dat。您可以通过计算这些波形示例中的 pcie_endpoint_litepcietlpdepacketizer_tlp_req_valid 来计算数据包的数量。

不可缓存：link -> 正确，1B x 8 数据包
写入组合：link -> 正确，8B x 1 数据包
回写：link -> 意外，1B x 8 数据包

系统配置如下。

CPU：英特尔(R) 至强(R) CPU E5-2630 v4 @ 2.20GHz
OS: Linux 内核 4.15.0-38
PCIe 设备：使用 litepcie

相关链接

Answer 1

简而言之，映射 MMIO 区域回写似乎在设计上不起作用。

如果有人认为可行，请上传答案。

我是来寻找 John McCalpin 的文章和答案的。首先，映射 MMIO 区域回写是不可能的。其次，在某些处理器上可以使用解决方法。

映射MMIO区域回写是不可能的

Quote from this link

FYI: The WB type will not work with memory-mapped IO. You can program the bits to set up the mapping as WB, but the system will crash as soon as it gets a transaction that it does not know how to handle. It is theoretically possible to use WP or WT to get cached reads from MMIO, but coherence has to be handled in software.

Quote from this link

Only when I set both PAT and MTRR to WB does the kernel crash
在某些处理器上可以使用解决方法

Notes on Cached Access to Memory-Mapped IO Regions, John McCalpin

There is one set of mappings that can be made to work on at least some x86-64 processors, and it is based on mapping the MMIO space twice. Map the MMIO range with a set of attributes that allow write-combining stores (but only uncached reads). Map the MMIO range a second time with a set of attributes that allow cache-line reads (but only uncached, non-write-combined stores).

映射 MMIO 区域回写不起作用

Mapping MMIO region write-back does not work

linux

x86

caching

fpga

pci-e