Lambda 函数返回加载语言失败 'eng' Tesseract 无法加载任何语言！无法初始化 tesseract

Question

我正在使用 AWS - Lambda (Python)。

我正在处理使用 tesseract 包的现有代码。我的 main 中有一个函数调用它：

 def lambda_ocr(ze_path, step):

    if step == 1:
        ocr_options = "--oem 1 -l eng --psm 6"
    elif step == 2:
        ocr_options = "--oem 0 -l eng --psm 6"
    elif step == 3:
        ocr_options = "--oem 1 -l fra --psm 3"
    elif step == 4 :
        ocr_options = "--oem 0 -l fra --psm 11"
    else:
        print("WARNING invalid step given for ocr. default option --oem 1 -l fra --psm 3.")
        ocr_options = "--oem 1 -l fra --psm 3"
    res = ocr(ze_path, config=ocr_options)


def ocr(img_path, config="--oem 1 -l fra --psm 3"):
    """ This function is called by get_text_OCR_Parallel
        we can modify the tesseract config here
    """
    raw_text = pytesseract.image_to_string(img_path, config=config)

    return raw_text

def image_to_string(image,
                    lang=None,
                    config='',
                    nice=0,
                    output_type=Output.STRING):
    '''
    Returns the result of a Tesseract OCR run on the provided image to string
    '''
    args = [image, 'txt', lang, config, nice]

    return {
        Output.BYTES: lambda: run_and_get_output(*(args + [True])),
        Output.DICT: lambda: {'text': run_and_get_output(*args)},
        Output.STRING: lambda: run_and_get_output(*args),
    }[output_type]()

当我使用 step=1 调用 lambda_ocr 函数时，一切正常。但是当 step=2、3 或 4 时，它会抛出错误。

我对 tesseract 软件包了解不多，但根据 this，我应该安装缺少的软件包。

我不明白的是，如果软件包没有安装好，当 step=1 时它是如何工作的？它不应该也抛出错误吗？

感谢任何帮助。谢谢

Answer 1

解决方案是使用 Lambda 层来安装缺少的包。我从 git 下载了所需的文件，然后将 .zip 上传到 AWS 并使用它创建了一个层。

Lambda 函数返回加载语言失败 'eng' Tesseract 无法加载任何语言！无法初始化 tesseract

Lambda function returning Failed loading language 'eng' Tesseract couldn't load any languages! Could not initialize tesseract

python

amazon-web-services

aws-lambda

tesseract