将 utf-16 字符串传递给 Windows 函数

Passing utf-16 string to a Windows function

我有一个名为 some.dll 的 Windows dll,具有以下功能:

void some_func(TCHAR* input_string)
{
...
}

some_func 需要一个指向 utf-16 编码字符串的指针。

运行 这个 python 代码:

from ctypes import *

some_string = "disco duck"
param_to_some_func = c_wchar_p(some_string.encode('utf-16'))  #  here exception!

some_dll = ctypes.WinDLL(some.dll)
some_dll.some_func(param_to_some_func)

异常失败 "unicode string or integer address expected instead of bytes instance"

ctypes 和 ctypes.wintypes 的文档非常薄,我还没有找到将 python 字符串转换为 Windows 宽字符并将其传递给函数的方法.

根据[Python 3.Docs]: Built-in Types - Text Sequence Type - str重点是我的):

Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points.

Win 它们是 UTF16 编码。

所以,CTypesPython 之间的对应关系(也可以通过检查 差异之间):

╔═══════════════╦══════════════╦══════════════╗
║    CTypes     ║   Python 3   ║   Python 2   ║
╠═══════════════╬══════════════╬══════════════╣
║   c_char_p    ║    bytes     ║     str      ║
║   c_wchar_p   ║     str      ║   unicode    ║
╚═══════════════╩══════════════╩══════════════╝


示例:

  • Python 3:

    >>> import sys
    >>> import ctypes as ct
    >>>
    >>> sys.version
    '3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]'
    >>>
    >>> text_ascii = b"Dummy"
    >>> text_unicode = "Dummy"
    >>>
    >>> ct.c_char_p(text_ascii)
    c_char_p(2563882450144)
    >>>
    >>> ct.c_wchar_p(text_ascii)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unicode string or integer address expected instead of bytes instance
    >>>
    >>> ct.c_char_p(text_unicode)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: bytes or integer address expected instead of str instance
    >>>
    >>> ct.c_wchar_p(text_unicode)
    c_wchar_p(2563878400656)
    
  • Python 2(注意 str <=> unicode 自动执行转换):

    >>> import sys
    >>> import ctypes as ct
    >>>
    >>> sys.version
    '2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]'
    >>>
    >>> text_ascii = "Dummy"
    >>> text_unicode = u"Dummy"
    >>>
    >>> ct.c_char_p(text_ascii)
    c_char_p('Dummy')
    >>>
    >>> ct.c_wchar_p(text_ascii)
    c_wchar_p(u'Dummy')
    >>>
    >>> ct.c_char_p(text_unicode)
    c_char_p('Dummy')
    >>>
    >>> ct.c_wchar_p(text_unicode)
    c_wchar_p(u'Dummy')
    

回到你的情况:

>>> import ctypes as ct
>>>
>>> some_string = "disco duck"
>>>
>>> enc_utf16 = some_string.encode("utf16")
>>> enc_utf16
b'\xff\xfed\x00i\x00s\x00c\x00o\x00 \x00d\x00u\x00c\x00k\x00'
>>>
>>> type(some_string), type(enc_utf16)
(<class 'str'>, <class 'bytes'>)
>>>
>>> ct.c_wchar_p(some_string)  # This is the right way
c_wchar_p(2508534214928)
>>>
>>> ct.c_wchar_p(enc_utf16)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unicode string or integer address expected instead of bytes instance

作为旁注,TCHAR_UNICODE 上有所不同(它是 typedef)(不)被定义。查看 [MS.Docs]: Generic-Text Mappings in tchar.h 了解更多详情。因此,根据 C 代码编译标志,Python 代码可能也需要调整。