来自 python 字符串的 C 字符数组

C char array from python string

我在 python 中有一个字符串列表,我正试图将其传递给 C 扩展以进行字符分析。到目前为止,我已经将列表分解为各自的字符串 PyObjects。接下来,我希望将这些字符串拆分成它们各自的字符,以便每个字符串 PyObject 现在都是一个对应的 C 类型字符数组。不过我似乎不知道该怎么做。

这是我目前所知道的:目前在构建 .pyd 文件后,它将 return 一个 1 的列表作为 Python 的填充(所以其他一切都有效),我只是不不知道如何将字符串 PyObject 拆分为 C 类型字符数组。

--- cExt.c ---

#include <Python.h>
#include <stdio.h>

static int *CitemCheck(PyObject *commandString, int commandStringLength) {

    // HAALP

    //char* commandChars = (char*) malloc(commandStringLength*sizeof(char*));

    // char c[] = PyString_AsString("c", commandString);
    // printf("%c" , c);
    // printf("%s", PyString_AsString(commandString));
    // for (int i=0; i<sizeof(commandChars)/sizeof(*commandChars); i++) {
    //     printf("%s", PyString_AsString(commandString));
    //     printf("%c", commandChars[i]);
    // }
    return 1; // TODO: RETURN PROPER RESULTANT
}

static PyObject *ClistCheck(PyObject *commandList, int commandListLength) {

    PyObject *results = PyList_New(commandListLength);

    for (int index = 0; index < commandListLength; index++) {
        PyObject *commandString;
        commandString = PyList_GetItem(commandList, index);
        int commandStringLength = PyObject_Length(commandString);

        // CitemCheck should take string PyObject and its length as int
        int x = CitemCheck(commandString, commandStringLength);

        PyObject* pyItem = Py_BuildValue("i", x);
        PyList_SetItem(results, index, pyItem);
    }
    return results;
}

static PyObject *parseListCheck(PyObject *self, PyObject *args) {
    PyObject *commandList;
    int commandListLength;

    if (!PyArg_ParseTuple(args, "O", &commandList)){
        return NULL;
    }

    commandListLength = PyObject_Length(commandList);

    return Py_BuildValue("O", ClistCheck(commandList, commandListLength));
}

static char listCheckDocs[] = 
    ""; // TODO: ADD DOCSTRING

static PyMethodDef listCheck[] = {
 {"listCheck", (PyCFunction) parseListCheck, METH_VARARGS, listCheckDocs},
 {NULL,NULL,0,NULL}
};

static struct PyModuleDef DCE = {
    PyModuleDef_HEAD_INIT,
    "listCheck",
    NULL,
    -1,
    listCheck
};

PyMODINIT_FUNC PyInit_cExt(void){
    return PyModule_Create(&DCE);
}

供参考,我的临时扩展构建文件:

--- _c_setup.py --- 
(located in same folder as cExt.c)
"""
to build C files, pass:

python _c_setup.py build_ext --inplace clean --all

in command prompt which is cd'd to the file's dierctory
"""
import glob
from setuptools import setup, Extension, find_packages
from os import path

here = path.abspath(path.dirname(__file__))
files = [path.split(x)[1] for x in glob.glob(path.join(here, '**.c'))]

extensions = [Extension(
    path.splitext(x)[0], [x]
) for x in files]

setup(
    ext_modules = extensions,
)

您可以使用PyUnicode_AsEncodedString,其中

Encode a Unicode object and return the result as Python bytes object. encoding and errors have the same meaning as the parameters of the same name in the Unicode encode() method. The codec to be used is looked up using the Python codec registry. Return NULL if an exception was raised by the codec.

https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsEncodedString

然后用 PyBytes_AsString 你会得到一个指向带有终止 NUL 字节的内部缓冲区的指针。这个缓冲区既不能被释放也不能被修改。如果您需要一份副本,您可以使用例如strdup.

https://docs.python.org/3/c-api/bytes.html#c.PyBytes_AsString

稍微修改您的代码,它可能看起来像这样:

PyObject *encodedString = PyUnicode_AsEncodedString(commandString, "UTF-8", "strict");
if (encodedString) { //returns NULL if an exception was raised
    char *commandChars = PyBytes_AsString(encodedString); //pointer refers to the internal buffer of encodedString
    if(commandChars) {
        printf("the string '%s' consists of the following chars:\n", commandChars);
        for (int i = 0; commandChars[i] != '[=10=]'; i++) {
            printf("%c ", commandChars[i]);
        }
        printf("\n");
    }
    Py_DECREF(encodedString);
}

如果要测试:

import cExt

fruits = ["apple", "pears", "cherry", "pear", "blueberry", "strawberry"]         
res = cExt.listCheck(fruits)
print(res)

输出将是:

the string 'apple' consists of the following chars:
a p p l e 
the string 'pears' consists of the following chars:
p e a r s 
the string 'cherry' consists of the following chars:
c h e r r y 
the string 'pear' consists of the following chars:
p e a r 
the string 'blueberry' consists of the following chars:
b l u e b e r r y 
the string 'strawberry' consists of the following chars:
s t r a w b e r r y 
[1, 1, 1, 1, 1, 1]

与问题没有直接关系的旁注: 您的 CitemCheck 函数 return 是一个指向 int 的指针,但是如果查看它的调用方式,您似乎想要 return 一个 int 值。函数签名应该更像这样:

static int CitemCheck(PyObject *commandString, int commandStringLength)

(注意 int 后删除的 *)。