不同的pybind11的类型转换选项有什么区别?

What is the difference between different pybind11's type conversion options?

我有一个混合 cpp 和 python 代码的项目。

由于多种原因,前端需要在python中,后端需要在cpp中。

现在,我正在寻找关于如何将我的 python 对象传递给 cpp 的解决方案。一个需要注意的事实是,后端需要在某些时候回调到 python 以计算一些数字,其中 python 函数将 return 一个浮点数列表。

我一直在查看此处定义的 pybind 类型转换选项: https://pybind11.readthedocs.io/en/stable/advanced/cast/index.html

但是,对我来说,选项 1 似乎很容易使用,正如我在这里看到的那样: https://pybind11.readthedocs.io/en/stable/advanced/classes.html#overriding-virtual-functions-in-python

所以我想知道,为什么有人会选择数字 3?它与选项 1 相比如何?

非常感谢

Yes, if the main code is in C++ and the bindings are well fleshed out, then option 1 is the easiest to work with, as in that case the bound C++ objects are as natural to use in Python as native Python 类. It makes life easier because you get full control over object identity and whether or not to copy.

For 3, I'm finding pybind11 to be too aggressive with copying when using callbacks (as seems to be your use case), e.g. with numpy arrays it's perfectly possible to work with the buffer on the C++ side if it is verified to be contiguous. Sure, copying will safeguard against memory problems, but there's too little control given over copying v.s. non-copying (numpy has the same problem tbs).

The reason why 3 exists is mostly because it improves usability and provides nice syntax. For example, if we have a function with this signature:

void func(const std::vector<int>&)

then it is nice to be able to call it from the Python side as func((1, 2, 3)) or even func(range(3)). It's convenient, easy to use, looks clean, etc. But at that point, there is no way out but to copy, since the memory layout of a tuple is so different from a std::vector (and the range does not even represent an in-memory container).

Note carefully however, that with the func example above, the caller could still decide to provide a bound std::vector<int> object, and thus pre-empt any copying. May not look as nice, but there is full control. This is useful, for example if the vector is a return value from some other function, or is modified in between calls:

v = some_calc()   # with v a bound C++ vector
func(v)
v.append(4)       # add an element
func(v)

Contrast this to the case where a list of floats is returned after calculating some numbers, analog to (but not quite) your description:

std::list<float> calc()

If you choose "option 1", then the bound function calc will return a bound C++ object of std::list<float>. If you choose "option 3", then the bound function calc will return a Python list with the contents of the C++ std::list<float> copied into it.

The problem that arises with "option 3" is that if the caller actually wanted a bound C++ object, then the values need to be copied back into a new list, so a total of 2 copies. OTOH, if you choose "option 1" and the caller wanted instead a Python list, then they are free to do the copy on the return value of calc if desired:

res = calc()
list_res = list(res)

or even, if they want this all the time:

def pycalc():
    return list(calc())

Now finally to your specific case where it is a Python callback, called from C++, that returns a list of floats. If you use "option 1", then the Python function is forced to create a C++ list to return, so for example (with type cpplist the name given to a bound type std::list<float>):

def pycalc():
    return cpplist(range(3))

which a Python programmer would not find pretty. Instead, by choosing "option 3", checking the return type and doing a conversion if needed, this would be valid as well:

def pycalc():
    return [x for x in range(3)]

Depending on the overall requirements and typical use cases then, "option 3" may be more appreciated by your users.