将 utf-16 字符串传递给 Windows 函数
Passing utf-16 string to a Windows function
我有一个名为 some.dll 的 Windows dll,具有以下功能:
void some_func(TCHAR* input_string)
{
...
}
some_func 需要一个指向 utf-16 编码字符串的指针。
运行 这个 python 代码:
from ctypes import *
some_string = "disco duck"
param_to_some_func = c_wchar_p(some_string.encode('utf-16')) # here exception!
some_dll = ctypes.WinDLL(some.dll)
some_dll.some_func(param_to_some_func)
异常失败 "unicode string or integer address expected instead of bytes instance"
ctypes 和 ctypes.wintypes 的文档非常薄,我还没有找到将 python 字符串转换为 Windows 宽字符并将其传递给函数的方法.
根据[Python 3.Docs]: Built-in Types - Text Sequence Type - str(重点是我的):
Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points.
在 Win 它们是 UTF16 编码。
所以,CTypes 和 Python 之间的对应关系(也可以通过检查 差异之间):
╔═══════════════╦══════════════╦══════════════╗
║ CTypes ║ Python 3 ║ Python 2 ║
╠═══════════════╬══════════════╬══════════════╣
║ c_char_p ║ bytes ║ str ║
║ c_wchar_p ║ str ║ unicode ║
╚═══════════════╩══════════════╩══════════════╝
示例:
Python 3:
>>> import sys
>>> import ctypes as ct
>>>
>>> sys.version
'3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]'
>>>
>>> text_ascii = b"Dummy"
>>> text_unicode = "Dummy"
>>>
>>> ct.c_char_p(text_ascii)
c_char_p(2563882450144)
>>>
>>> ct.c_wchar_p(text_ascii)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unicode string or integer address expected instead of bytes instance
>>>
>>> ct.c_char_p(text_unicode)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: bytes or integer address expected instead of str instance
>>>
>>> ct.c_wchar_p(text_unicode)
c_wchar_p(2563878400656)
Python 2(注意 str <=> unicode 自动执行转换):
>>> import sys
>>> import ctypes as ct
>>>
>>> sys.version
'2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]'
>>>
>>> text_ascii = "Dummy"
>>> text_unicode = u"Dummy"
>>>
>>> ct.c_char_p(text_ascii)
c_char_p('Dummy')
>>>
>>> ct.c_wchar_p(text_ascii)
c_wchar_p(u'Dummy')
>>>
>>> ct.c_char_p(text_unicode)
c_char_p('Dummy')
>>>
>>> ct.c_wchar_p(text_unicode)
c_wchar_p(u'Dummy')
回到你的情况:
>>> import ctypes as ct
>>>
>>> some_string = "disco duck"
>>>
>>> enc_utf16 = some_string.encode("utf16")
>>> enc_utf16
b'\xff\xfed\x00i\x00s\x00c\x00o\x00 \x00d\x00u\x00c\x00k\x00'
>>>
>>> type(some_string), type(enc_utf16)
(<class 'str'>, <class 'bytes'>)
>>>
>>> ct.c_wchar_p(some_string) # This is the right way
c_wchar_p(2508534214928)
>>>
>>> ct.c_wchar_p(enc_utf16)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unicode string or integer address expected instead of bytes instance
作为旁注,TCHAR 在 _UNICODE 上有所不同(它是 typedef)(不)被定义。查看 [MS.Docs]: Generic-Text Mappings in tchar.h 了解更多详情。因此,根据 C 代码编译标志,Python 代码可能也需要调整。
我有一个名为 some.dll 的 Windows dll,具有以下功能:
void some_func(TCHAR* input_string)
{
...
}
some_func 需要一个指向 utf-16 编码字符串的指针。
运行 这个 python 代码:
from ctypes import *
some_string = "disco duck"
param_to_some_func = c_wchar_p(some_string.encode('utf-16')) # here exception!
some_dll = ctypes.WinDLL(some.dll)
some_dll.some_func(param_to_some_func)
异常失败 "unicode string or integer address expected instead of bytes instance"
ctypes 和 ctypes.wintypes 的文档非常薄,我还没有找到将 python 字符串转换为 Windows 宽字符并将其传递给函数的方法.
根据[Python 3.Docs]: Built-in Types - Text Sequence Type - str(重点是我的):
Textual data in Python is handled with str objects, or strings. Strings are immutable sequences of Unicode code points.
在 Win 它们是 UTF16 编码。
所以,CTypes 和 Python 之间的对应关系(也可以通过检查 差异之间):
╔═══════════════╦══════════════╦══════════════╗ ║ CTypes ║ Python 3 ║ Python 2 ║ ╠═══════════════╬══════════════╬══════════════╣ ║ c_char_p ║ bytes ║ str ║ ║ c_wchar_p ║ str ║ unicode ║ ╚═══════════════╩══════════════╩══════════════╝
示例:
Python 3:
>>> import sys >>> import ctypes as ct >>> >>> sys.version '3.7.6 (tags/v3.7.6:43364a7ae0, Dec 19 2019, 00:42:30) [MSC v.1916 64 bit (AMD64)]' >>> >>> text_ascii = b"Dummy" >>> text_unicode = "Dummy" >>> >>> ct.c_char_p(text_ascii) c_char_p(2563882450144) >>> >>> ct.c_wchar_p(text_ascii) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unicode string or integer address expected instead of bytes instance >>> >>> ct.c_char_p(text_unicode) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: bytes or integer address expected instead of str instance >>> >>> ct.c_wchar_p(text_unicode) c_wchar_p(2563878400656)
Python 2(注意 str <=> unicode 自动执行转换):
>>> import sys >>> import ctypes as ct >>> >>> sys.version '2.7.17 (v2.7.17:c2f86d86e6, Oct 19 2019, 21:01:17) [MSC v.1500 64 bit (AMD64)]' >>> >>> text_ascii = "Dummy" >>> text_unicode = u"Dummy" >>> >>> ct.c_char_p(text_ascii) c_char_p('Dummy') >>> >>> ct.c_wchar_p(text_ascii) c_wchar_p(u'Dummy') >>> >>> ct.c_char_p(text_unicode) c_char_p('Dummy') >>> >>> ct.c_wchar_p(text_unicode) c_wchar_p(u'Dummy')
回到你的情况:
>>> import ctypes as ct >>> >>> some_string = "disco duck" >>> >>> enc_utf16 = some_string.encode("utf16") >>> enc_utf16 b'\xff\xfed\x00i\x00s\x00c\x00o\x00 \x00d\x00u\x00c\x00k\x00' >>> >>> type(some_string), type(enc_utf16) (<class 'str'>, <class 'bytes'>) >>> >>> ct.c_wchar_p(some_string) # This is the right way c_wchar_p(2508534214928) >>> >>> ct.c_wchar_p(enc_utf16) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unicode string or integer address expected instead of bytes instance
作为旁注,TCHAR 在 _UNICODE 上有所不同(它是 typedef)(不)被定义。查看 [MS.Docs]: Generic-Text Mappings in tchar.h 了解更多详情。因此,根据 C 代码编译标志,Python 代码可能也需要调整。