C 扩展 - 如何将 printf 重定向到 python 记录器？

Question

我有一个简单的 C 扩展（参见下面的示例），有时使用 printf 函数进行打印。我正在寻找一种方法来包装来自该 C 扩展的函数调用，以便所有这些 printfs 将被重定向到我的 python 记录器。

hello.c:

#include <Python.h>

static PyObject* hello(PyObject* self)
{
   printf("example print from a C code\n");
   return Py_BuildValue("");
}

static char helloworld_docs[] =
   "helloworld(): Any message you want to put here!!\n";

static PyMethodDef helloworld_funcs[] = {
   {"hello", (PyCFunction)hello,
   METH_NOARGS, helloworld_docs},
   {NULL}
};

static struct PyModuleDef cModPyDem =
{
    PyModuleDef_HEAD_INIT,
    "helloworld",
    "Extension module example!",
    -1,
    helloworld_funcs
};

PyMODINIT_FUNC PyInit_helloworld(void)
{
    return PyModule_Create(&cModPyDem);
};

setup.py:

from distutils.core import setup, Extension
setup(name = 'helloworld', version = '1.0',  \
   ext_modules = [Extension('helloworld', ['hello.c'])])

先用运行

python3 setup.py install

然后：

import helloworld
helloworld.hello()

我希望能够做这样的事情：

with redirect_to_logger(my_logger)
   helloworld.hello()

编辑： 我看到许多 post 展示了如何使 C 的打印静音，但我无法从中弄清楚如何才能我改为在 python 中捕捉指纹。例如 post：Redirect stdout from python for C calls

我认为这个问题没有引起太大的关注，因为我可能问得太多了，所以我不再关心日志记录了……我如何才能在 python 中捕获 C 打印？到列表或其他什么。

编辑所以我能够在某种程度上实现我想要的工作代码 - 将 c printf 重定向到 python logger:

import select
import threading
import time
import logging
import re

from contextlib import contextmanager

from wurlitzer import pipes
from helloworld import hello


logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

    

class CPrintsHandler(threading.Thread):
    def __init__(self, std, poll_std, err, poll_err, logger):
        super(CPrintsHandler, self).__init__()
        self.std = std
        self.poll_std = poll_std
        self.err = err
        self.poll_err = poll_err
        self.logger = logger
        self.stop_event = threading.Event()

    def stop(self):
        self.stop_event.set()

    def run(self):
        while not self.stop_event.is_set():
            # How can I poll both std and err at the same time?
            if self.poll_std.poll(1):
                line = self.std.readline()
                if line:
                    self.logger.debug(line.strip())

            if self.poll_err.poll(1):
                line = self.err.readline()
                if line:
                    self.logger.debug(line.strip())


@contextmanager
def redirect_to_logger(some_logger):
    handler = None
    try:
        with pipes() as (std, err):
            poll_std = select.poll()
            poll_std.register(std, select.POLLIN)
            poll_err = select.poll()
            poll_err.register(err, select.POLLIN)
            handler = CPrintsHandler(std, poll_std, err, poll_err, some_logger)
            handler.start()
            yield
    finally:
        if handler:
            time.sleep(0.1) # why do I have to sleep here for the foo prints to finish?
            handler.stop()
            handler.join()


def foo():
    logger.debug('logger print from foo()')
    hello()


def main():
    with redirect_to_logger(logger):
        # I don't want the logs from here to be redirected as well, only printf.
        logger.debug('logger print from main()')
        foo()


main()

但我有几个问题：

python 日志也被 CPrintsHandler 重定向和捕获。有没有办法避免这种情况？
打印顺序不正确：

python3 redirect_c_example_for_Whosebug.py

2020-08-18 19:50:47,732 - root - DEBUG - 来自 C 代码的示例打印

2020-08-18 19:50:47,733 - root - DEBUG - 2020-08-18 19:50:47,731 - root - DEBUG - 来自 main()
的记录器打印
2020-08-18 19:50:47,733 - root - DEBUG - 2020-08-18 19:50:47,731 - root - DEBUG - 来自 foo()
的记录器打印

此外，记录器打印所有 go to err，也许我轮询它们的方式导致了这个顺序。

我对 python 中的 select 不太熟悉，不确定是否有办法同时轮询 std 和 err 并打印先有内容的那个。

Answer 1

在 Linux 上，您可以使用 wurlitzer 来捕获 fprint 的输出，例如：

from wurlitzer import pipes
with pipes() as (out, err):
    helloworld.hello()
out.read()
#'example print from a C code\n'

wurlitzer 基于 this article of Eli Bendersky，如果您不喜欢依赖 third-party 库，可以使用该代码。

遗憾的是，wurlitzer 和文章中的代码仅适用于 Linux（以及可能的 MacOS）。

这里是 Windows 的原型（原型的改进版本可以从 my github 安装）使用 Eli 的方法作为 Cython-extension（可能可以翻译成 ctypes 如果需要）：

%%cython

import io
import os

cdef extern from *:
    """
    #include <windows.h>
    #include <io.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <fcntl.h>

    int open_temp_file() {
        TCHAR lpTempPathBuffer[MAX_PATH+1];//path+NULL

        //  Gets the temp path env string (no guarantee it's a valid path).
        DWORD dwRetVal = GetTempPath(MAX_PATH,          // length of the buffer
                                     lpTempPathBuffer); // buffer for path 
        if(dwRetVal > MAX_PATH || (dwRetVal == 0))
        {
            return -1;
        }

        //  Generates a temporary file name. 
        TCHAR szTempFileName[MAX_PATH + 1];//path+NULL
        DWORD uRetVal = GetTempFileName(lpTempPathBuffer, // directory for tmp files
            TEXT("tmp"),     // temp file name prefix 
            0,                // create unique name 
            szTempFileName);  // buffer for name 
        if (uRetVal == 0)
        {
            return -1;
        }

        HANDLE tFile = CreateFile((LPTSTR)szTempFileName, // file name 
                GENERIC_READ | GENERIC_WRITE,      // first we write than we read 
                0,                    // do not share 
                NULL,                 // default security 
                CREATE_ALWAYS,        // overwrite existing
                FILE_ATTRIBUTE_TEMPORARY | FILE_FLAG_DELETE_ON_CLOSE, // "temporary" temporary file, see https://docs.microsoft.com/en-us/archive/blogs/larryosterman/its-only-temporary
                NULL);                // no template 

        if (tFile == INVALID_HANDLE_VALUE) {
            return -1;
        }

        return _open_osfhandle((intptr_t)tFile, _O_APPEND | _O_TEXT);
    }

    int replace_stdout(int temp_fileno)
    {
        fflush(stdout);
        int old;
        int cstdout = _fileno(stdout);

        old = _dup(cstdout);   // "old" now refers to "stdout"
        if (old == -1)
        {
            return -1;
        }
        if (-1 == _dup2(temp_fileno, cstdout))
        {
            return -1;
        }
        return old;
    }

    int restore_stdout(int old_stdout){
        fflush(stdout);

        // Restore original stdout
        int cstdout = _fileno(stdout);
        return _dup2(old_stdout, cstdout);
    }
    
    
    void rewind_fd(int fd) {
        _lseek(fd, 0L, SEEK_SET);
    }
    """
    int open_temp_file()
    int replace_stdout(int temp_fileno)
    int restore_stdout(int old_stdout)
    void rewind_fd(int fd)
    void close_fd "_close" (int fd)
    
cdef class CStdOutCapture():
    cdef int tmpfile_fd
    cdef int old_stdout_fd
    def start(self): #start capturing
        self.tmpfile_fd = open_temp_file()
        self.old_stdout_fd = replace_stdout(self.tmpfile_fd)
    
    def stop(self): # stops capturing, frees resources and returns the content
        restore_stdout(self.old_stdout_fd)
        rewind_fd(self.tmpfile_fd) # need to read from the beginning
        buffer = io.TextIOWrapper(os.fdopen(self.tmpfile_fd, 'rb'))
        result = buffer.read()
        close_fd(self.tmpfile_fd)
        return result

现在：

b = CStdOutCapture()
b.start()
helloworld.hello()
out = b.stop()
print("HERE WE GO:", out)
# HERE WE GO: example print from a C code

Answer 2

如果我有空编辑 C 代码，这就是我要做的。在 C 中打开内存映射并使用 fprintf() 写入其文件描述符。将文件描述符作为 int 公开给 Python，然后使用 mmap 模块打开它或使用 os.openfd() 将其包装在更简单的 file-like 对象中，或将其包装在 file-like C 中的对象并让 Python 使用它。

然后我会创建一个 class 使我能够通过常用接口写入 sys.stdout，即它的 write() 方法（对于 Python's side usage) ，这将使用 select 模块从 C 中轮询文件，该文件在线程中充当其标准输出。然后我会用这个 class 的对象切换 sys.stdout。因此，当 Python 执行 sys.stdout.write(...) 时，字符串将被重定向到 sys.stdout.write()，当循环进入线程检测到来自 C 的文件的输出，它将使用 sys.stdout.write() 写入它。因此，所有内容都将写入屏幕并可供记录器使用。在这个模型中，严格的 C 部分实际上永远不会写入连接到终端的文件描述符。

你甚至可以在 C 本身中做很多这样的事情，而在 Python 方面留下很少的东西，但是它更容易从 Python 方面影响解释器，因为扩展是整个故事中涉及某种共享库，我们称之为 IPC 和 OS。这就是为什么首先不在扩展和 Python 之间共享标准输出的原因。

如果你想在 C 端继续 printf()，你可以看看如何在编写这整个烂摊子之前在 C 中重定向它。

这个答案完全是理论上的，因为我没有时间去测试它；但据我所知应该是可行的。如果您尝试过，请在评论中告诉我进展情况。也许我错过了什么，但是，我确信这个理论是合理的。这个想法的美妙之处在于它将 OS 独立，尽管共享内存或连接文件描述符以在 RAM 中分配 space 的部分有时可能是 Windows.[=10 上的 PITA =]

Answer 3

如果您不局限于在 C 中使用 printf，那么使用 python C API 中的打印等价物并传递您想要重定向的位置会更容易消息作为参数。

例如，您的 hello.c 将是：

#include <Python.h>

static PyObject* hello(PyObject* self, PyObject *args)
{
    PyObject *file = NULL;
    if (!PyArg_ParseTuple(args, "O", &file))
        return NULL;
    PyObject *pystr = PyUnicode_FromString("example print from a C code\n");
    PyFile_WriteObject(pystr, file, Py_PRINT_RAW);
   return Py_BuildValue("");                                                                     
}

static char helloworld_docs[] =
   "helloworld(): Any message you want to put here!!\n";

static PyMethodDef helloworld_funcs[] = { 
   {"hello", (PyCFunction)hello,
   METH_VARARGS, helloworld_docs},
   {NULL}
};

static struct PyModuleDef cModPyDem =
{
    PyModuleDef_HEAD_INIT,
    "helloworld",
    "Extension module example!",
    -1, 
    helloworld_funcs
};

PyMODINIT_FUNC PyInit_helloworld(void)
{
    return PyModule_Create(&cModPyDem);
};

我们可以检查它是否与以下程序一起工作：

import sys
import helloworld

helloworld.hello(sys.stdout)
helloworld.hello(sys.stdout)

helloworld.hello(sys.stderr)

在命令行中我们分别重定向每个输出：

python3 example.py 1> out.txt 2> err.txt

out.txt 将有两个打印调用，而 err.txt 将只有一个，正如我们的 python 脚本所预期的那样。

您可以查看 python 的 print 实施，以进一步了解您可以做什么。

cpython print source code

C 扩展 - 如何将 printf 重定向到 python 记录器？

C extensions - how to redirect printf to a python logger?

logging

stdout

python-c-api

python-3.x

python-logging