从 Python 调用 cpp 函数时出现分段错误
Segmentation fault while calling cpp function from Python
我正在尝试从 python:
呼叫 this cpp function
TESS_API BOOL TESS_CALL TessBaseAPIProcessPages(TessBaseAPI* handle, const char* filename,
const char* retry_config, int timeout_millisec, TessResultRenderer* renderer)
{
if (handle->ProcessPages(filename, retry_config, timeout_millisec, renderer))
return TRUE;
else
return FALSE;
}
这个函数的最后一个参数是TessResultRenderer
。有 another cpp function 用于创建 TessResultRenderer
TESS_API TessResultRenderer* TESS_CALL TessTextRendererCreate(const char* outputbase)
{
return new TessTextRenderer(outputbase);
}
现在从我的 python 调用它时,我执行了以下操作:
outputbase = "stdout"
renderer = tesseract.TessTextRendererCreate(outputbase)
text_out = tesseract.TessBaseAPIProcessPages(api,
ctypes.create_string_buffer(path),
None, 0, renderer) //Segmentation fault (core dumped) error on this line
但我一直收到 Segmentation fault
错误。
我的问题是如何从 Python 调用 TessBaseAPIProcessPages
?
代码库中的更多参考链接:
Implementation of processPages(...)
编辑
尝试评论的建议后,我做了以下操作,但出现错误:item 1 in _argtypes_ has no from_param method
PTessResultRenderer = ctypes.POINTER(TessResultRenderer)
self.tesseract.TessTextRendererCreate.restype = PTessResultRenderer
outputbase = "stdout"
self.tesseract.TessTextRendererCreate.argtypes = [outputbase] #error here
self.tesseract.TessTextRendererCreate
ReturnVal = ctypes.c_bool
self.tesseract.TessBaseAPIProcessPages.argtypes = [self.api, path, None, 0, PTessResultRenderer]
self.tesseract.TessBaseAPIProcessPages.restype = ReturnVal
self.tesseracto.TessBaseAPIProcessPages
class TessResultRenderer(ctypes.Structure):
pass
当您 运行 离开数组或取消引用空指针时,会发生段错误。如果您使用调试器,它会引导您完成所有代码并准确显示正在发生的事情。
contrib 文件夹中有一个使用 ctypes 中的 tesseract C-API 的示例。然而,它似乎有点过时了。 contrib/tesseract-c_api-demo.py
您需要为一些方法设置 restype
和 argtypes
。另外,不要忘记在处理程序上调用 init 函数。以下示例对我有用。它从名为 "test.bmp" 的英文文件中读取文本到 text
变量中。
from ctypes import *
from ctypes.util import find_library
lang = b"eng"
filename = b"test.bmp"
TESSDATA_PREFIX = b"/usr/local/Cellar/tesseract/3.04.01_1/share/tessdata"
path = find_library("libtesseract.dylib")
tesseract = CDLL(path)
class TessBaseAPI(Structure):
pass
class TessResultRenderer(Structure):
pass
tesseract.TessBaseAPICreate.restype = POINTER(TessBaseAPI)
tesseract.TessBaseAPIInit3.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p]
tesseract.TessBaseAPIInit3.restype = c_bool
tesseract.TessBaseAPIProcessPages.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p, c_int, POINTER(TessResultRenderer)]
tesseract.TessBaseAPIProcessPages.restype = c_bool
tesseract.TessBaseAPIGetUTF8Text.argtypes = [POINTER(TessBaseAPI)]
tesseract.TessBaseAPIGetUTF8Text.restype = c_char_p
api = tesseract.TessBaseAPICreate()
rc = tesseract.TessBaseAPIInit3(api, TESSDATA_PREFIX, lang);
if (rc):
tesseract.TessBaseAPIDelete(api)
print("Could not initialize tesseract.\n")
exit(3)
success = tesseract.TessBaseAPIProcessPages(api, filename, None , 0, None)
if success:
text = tesseract.TessBaseAPIGetUTF8Text(api)
print("="*78)
print(text.decode("utf-8").strip())
print("="*78)
输出如下所示:
==============================================================================
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.
The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.
==============================================================================
编辑:按照 eryksun 的建议,将 c_void_p
的使用替换为不透明类型。谢谢!
我正在尝试从 python:
呼叫 this cpp functionTESS_API BOOL TESS_CALL TessBaseAPIProcessPages(TessBaseAPI* handle, const char* filename,
const char* retry_config, int timeout_millisec, TessResultRenderer* renderer)
{
if (handle->ProcessPages(filename, retry_config, timeout_millisec, renderer))
return TRUE;
else
return FALSE;
}
这个函数的最后一个参数是TessResultRenderer
。有 another cpp function 用于创建 TessResultRenderer
TESS_API TessResultRenderer* TESS_CALL TessTextRendererCreate(const char* outputbase)
{
return new TessTextRenderer(outputbase);
}
现在从我的 python 调用它时,我执行了以下操作:
outputbase = "stdout"
renderer = tesseract.TessTextRendererCreate(outputbase)
text_out = tesseract.TessBaseAPIProcessPages(api,
ctypes.create_string_buffer(path),
None, 0, renderer) //Segmentation fault (core dumped) error on this line
但我一直收到 Segmentation fault
错误。
我的问题是如何从 Python 调用 TessBaseAPIProcessPages
?
代码库中的更多参考链接:
Implementation of processPages(...)
编辑
尝试评论的建议后,我做了以下操作,但出现错误:item 1 in _argtypes_ has no from_param method
PTessResultRenderer = ctypes.POINTER(TessResultRenderer)
self.tesseract.TessTextRendererCreate.restype = PTessResultRenderer
outputbase = "stdout"
self.tesseract.TessTextRendererCreate.argtypes = [outputbase] #error here
self.tesseract.TessTextRendererCreate
ReturnVal = ctypes.c_bool
self.tesseract.TessBaseAPIProcessPages.argtypes = [self.api, path, None, 0, PTessResultRenderer]
self.tesseract.TessBaseAPIProcessPages.restype = ReturnVal
self.tesseracto.TessBaseAPIProcessPages
class TessResultRenderer(ctypes.Structure):
pass
当您 运行 离开数组或取消引用空指针时,会发生段错误。如果您使用调试器,它会引导您完成所有代码并准确显示正在发生的事情。
contrib 文件夹中有一个使用 ctypes 中的 tesseract C-API 的示例。然而,它似乎有点过时了。 contrib/tesseract-c_api-demo.py
您需要为一些方法设置 restype
和 argtypes
。另外,不要忘记在处理程序上调用 init 函数。以下示例对我有用。它从名为 "test.bmp" 的英文文件中读取文本到 text
变量中。
from ctypes import *
from ctypes.util import find_library
lang = b"eng"
filename = b"test.bmp"
TESSDATA_PREFIX = b"/usr/local/Cellar/tesseract/3.04.01_1/share/tessdata"
path = find_library("libtesseract.dylib")
tesseract = CDLL(path)
class TessBaseAPI(Structure):
pass
class TessResultRenderer(Structure):
pass
tesseract.TessBaseAPICreate.restype = POINTER(TessBaseAPI)
tesseract.TessBaseAPIInit3.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p]
tesseract.TessBaseAPIInit3.restype = c_bool
tesseract.TessBaseAPIProcessPages.argtypes = [POINTER(TessBaseAPI), c_char_p, c_char_p, c_int, POINTER(TessResultRenderer)]
tesseract.TessBaseAPIProcessPages.restype = c_bool
tesseract.TessBaseAPIGetUTF8Text.argtypes = [POINTER(TessBaseAPI)]
tesseract.TessBaseAPIGetUTF8Text.restype = c_char_p
api = tesseract.TessBaseAPICreate()
rc = tesseract.TessBaseAPIInit3(api, TESSDATA_PREFIX, lang);
if (rc):
tesseract.TessBaseAPIDelete(api)
print("Could not initialize tesseract.\n")
exit(3)
success = tesseract.TessBaseAPIProcessPages(api, filename, None , 0, None)
if success:
text = tesseract.TessBaseAPIGetUTF8Text(api)
print("="*78)
print(text.decode("utf-8").strip())
print("="*78)
输出如下所示:
==============================================================================
This is a lot of 12 point text to test the
ocr code and see if it works on all types
of file format.
The quick brown dog jumped over the
lazy fox. The quick brown dog jumped
over the lazy fox. The quick brown dog
jumped over the lazy fox. The quick
brown dog jumped over the lazy fox.
==============================================================================
编辑:按照 eryksun 的建议,将 c_void_p
的使用替换为不透明类型。谢谢!