使用共享内存在 C++ 和 python 之间快速通信

Question

在跨平台（Linux 和 windows）实时应用程序中，我需要在 C++ 进程和我都使用的 python 应用程序之间共享数据的最快方式管理。我目前使用套接字，但在使用高带宽数据（30 fps 的 4K 图像）时速度太慢。

我最终想使用 multiprocessing shared memory 但我的第一次尝试表明它不起作用。我使用 Boost.Interprocess 在 C++ 中创建共享内存，并尝试像这样在 python 中读取它：

#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>

int main(int argc, char* argv[])
{
    using namespace boost::interprocess;

    //Remove shared memory on construction and destruction
    struct shm_remove
    {
        shm_remove() { shared_memory_object::remove("myshm"); }
        ~shm_remove() { shared_memory_object::remove("myshm"); }
    } remover;

    //Create a shared memory object.
    shared_memory_object shm(create_only, "myshm", read_write);

    //Set size
    shm.truncate(1000);

    //Map the whole shared memory in this process
    mapped_region region(shm, read_write);

    //Write all the memory to 1
    std::memset(region.get_address(), 1, region.get_size());

    std::system("pause");
}

我的 python 代码：

from multiprocessing import shared_memory

if __name__ == "__main__":
    shm_a = shared_memory.SharedMemory(name="myshm", create=False)
    buffer = shm_a.buf
    print(buffer[0])

我收到系统错误 FileNotFoundError: [WinError 2] : File not found。所以我猜它只能在 Python 多处理中内部工作，对吧？ Python 好像没有找到C++端创建的共享内存。

另一种可能性是使用 mmap but I'm afraid that's not as fast as "pure" shared memory (without using the filesystem). As stated by the Boost.interprocess documentation:

However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory

不过我不知道它慢到什么程度。我只是更喜欢最快的解决方案，因为这是我目前应用程序的瓶颈。

Answer 1

所以我这几天用mmap实现了共享内存，我觉得效果还是不错的。以下是比较我的两个实现的基准测试结果：纯 TCP 和 TCP 与共享内存的混合。

协议：

基准测试包括将数据从 C++ 移动到 Python 世界（使用 python 的 numpy.nparray），然后将数据发送回 C++ 进程。不涉及进一步的处理，只涉及序列化、反序列化和inter-process通信（IPC）。

案例A:

一个 C++ 进程使用 Boost.Asio
一个 Python3 个进程使用 standard python TCP sockets

通信是通过 TCP {header + 数据}完成的。

案例 B:

一个 C++ 进程使用 Boost.Asio 实现 TCP 通信，使用 Boost.Interprocess
一个 Python3 进程使用标准 TCP 套接字和 mmap

通信是混合的：同步是通过套接字完成的（只传递 header），数据是通过共享内存移动的。我认为这个设计很棒，因为我过去曾遇到 synchronization using condition variable in shared memory 的问题，而且 TCP 在 C++ 和 Python 环境中都很容易使用。

结果：

高频小数据

200 MBytes/s 总计：10 MByte 样本，每秒 20 个样本

Case	Global CPU consumption	C++ part	python part
A	17.5 %	10%	7.5%
B	6%	1%	5%

低频大数据

200 MBytes/s 总计：0.2 MByte 样本，每秒 1000 个样本

Case	Global CPU consumption	C++ part	python part
A	13.5 %	6.7%	6.8%
B	11%	5.5%	5.5%

最大带宽

A：250 兆字节/秒
B：600 兆字节/秒

结论：

在我的应用程序中，使用 mmap 以平均频率对大数据产生巨大影响（几乎 300% 的性能增益）。当使用非常高的频率和小数据时，共享内存的好处仍然存在，但并不那么令人印象深刻（只有 20% 的改进）。最大吞吐量是原来的 2 倍多。

使用 mmap 对我来说是一个很好的升级。我只是想在这里分享我的结果。

Answer 2

C++ 和 python 之间的通信示例，使用共享内存和内存映射可以在中找到。

使用共享内存在 C++ 和 python 之间快速通信

Fast communication between C++ and python using shared memory

c++

shared-memory

boost-interprocess

python-3.x

协议：

结果：

高频小数据

低频大数据

最大带宽

结论：