DMA 引擎超时和 DMA 内存映射

DMA Engine Timeout and DMA Memory Mapping

我正在尝试使用 Linux DMA 驱动程序。目前,当我发送交易并开始等待时,我的请求超时。我相信这与我在执行 DMA 映射时设置缓冲区的方式有关。

char *src_dma_buffer = kmalloc(dma_length, GFP_KERNEL);
char *dest_dma_buffer = kzalloc(dma_length, GFP_KERNEL);

tx_dma_handle = dma_map_single(tx_chan->device->dev, src_dma_buffer, dma_length, DMA_TO_DEVICE);    
rx_dma_handle = dma_map_single(rx_chan->device->dev, dest_dma_buffer, dma_length, DMA_FROM_DEVICE);

在 Xilinx 的 DMA 驱动程序中,他们特别注意内存对齐。特别是,他们使用 属性 的 dma_chan->dma_device 称为 copy_align.

  • @copy_align: alignment shift for memcpy operations
const int dma_length = 16*1024;
len = dmatest_random() % test_buf_size + 1;
len = (len >> align) << align;
if (!len)
    len = 1 << align;
src_off = dmatest_random() % (test_buf_size - len + 1);
dst_off = dmatest_random() % (test_buf_size - len + 1);

src_off = (src_off >> align) << align;
dst_off = (dst_off >> align) << align;

看起来原始地址完全是来自 dmatest_random() 的随机地址。不确定可以said/what 保证关于该内存的内容。

static unsigned long dmatest_random(void)
{
    unsigned long buf;

    get_random_bytes(&buf, sizeof(buf));
    return buf;
}

然后他们使用这些偏移量为 DMA 设置源缓冲区和目标缓冲区。

u8 *buf = thread->srcs[i] + src_off;

dma_srcs[i] = dma_map_single(tx_dev->dev, buf, len, DMA_MEM_TO_DEV);

我很困惑这是做什么的。我唯一的猜测是它将页面对齐虚拟内存中源缓冲区和目标缓冲区的开头。

看看我使用 kmalloc 和 kzalloc 设置缓冲区的方式,是否可以保证我的缓冲区从页面边界开始?我需要缓冲区从页面边界开始,我说得对吗?

Xilinx DMA 测试驱动程序的源代码在这里:https://github.com/Xilinx/linux-xlnx/blob/master/drivers/dma/xilinx/axidmatest.c

您可以在此处找到我试图解决的问题的高级描述:https://forums.xilinx.com/t5/Embedded-Linux/AXI-DMA-Drivers-for-Kernel-v-4-9-PetaLinux-2017-3/td-p/828917

看看 seems that you don't have any guarantee that your memory allocation will start at the beginning of a frame page. However, this other link 可能会有所帮助,其中解释了 alloc_pages,它可能更适合您的需要。

关于 DMA 事务中要使用的内存对齐,进入 this link 我们可以阅读以下内容:

What memory is DMA'able?

The first piece of information you must know is what kernel memory can be used with the DMA mapping facilities. There has been an unwritten set of rules regarding this, and this text is an attempt to finally write them down.

If you acquired your memory via the page allocator (i.e. __get_free_page*()) or the generic memory allocators (i.e. kmalloc() or kmem_cache_alloc()) then you may DMA to/from that memory using the addresses returned from those routines.

This means specifically that you may not use the memory/addresses returned from vmalloc() for DMA. It is possible to DMA to the underlying memory mapped into a vmalloc() area, but this requires walking page tables to get the physical addresses, and then translating each of those pages back to a kernel address using something like __va(). [ EDIT: Update this when we integrate Gerd Knorr's generic code which does this. ]

This rule also means that you may use neither kernel image addresses (items in data/text/bss segments), nor module image addresses, nor stack addresses for DMA. These could all be mapped somewhere entirely different than the rest of physical memory. Even if those classes of memory could physically work with DMA, you'd need to ensure the I/O buffers were cacheline-aligned. Without that, you'd see cacheline sharing problems (data corruption) on CPUs with DMA-incoherent caches. (The CPU could write to one word, DMA would write to a different one in the same cache line, and one of them could be overwritten.)

Also, this means that you cannot take the return of a kmap() call and DMA to/from that. This is similar to vmalloc().

What about block I/O and networking buffers? The block I/O and networking subsystems make sure that the buffers they use are valid for you to DMA from/to.

因此,我们只需要将地址与缓存行大小对齐,而不需要将内存与框架页对齐(它也可以,但不需要)。关于 manual about kmalloc,如果我们指定标志 GFP_DMA,我们将获得适合 DMA 事务的内存(与缓存行大小对齐)。