如何查找特定文本并在其后打印接下来的 2 个单词

Question

我的代码如下。

我目前有一个 if 语句来查找特定的单词，在本例中为 'INGREDIENTS'。

接下来，我需要从 'INGREDIENTS' 打印下一个 words/strings 而不是 print("true")。此 word/string 在图像中出现一次 ('INGREDIENTS')。

例如，我运行 .py 文件，如果我将其包含在我的脚本中，这就是我的输出：print(text)

Ground Almonds

INGREDIENTS: Ground Almonds(100%).

1kg

我只需要重新编码这一部分：

if 'INGREDIENTS' in text:
 print("True")
else:
 print("False")

所以输出是这样的：

INGREDIENTS: Ground Almonds

因为接下来的两个 words/strings 是 Ground 和 Almonds

Python代码

from PIL import Image
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r'C:\Users\gzi\AppData\Roaming\Python\Python37\site-packages\tesseract.exe'

img=Image.open('C:/Users/gzi/Desktop/work/lux.jpg')

text = pytesseract.image_to_string(img, lang = 'eng')


if 'INGREDIENTS' in text:
 print("True")
else:
 print("False")

Answer 1

因此，假设我们使用 pytesseract:

提取了以下文本

text = '''Ground Almonds
INGREDIENTS: Ground Almonds(100%).
1kg'''

我们可以通过以下方式达到预期效果：

first_index = text.find('INGREDIENTS')
second_index = text.find('(')
my_string = f'{text[first_index:second_index]}'
print(my_string)

输出为：

INGREDIENTS: Ground Almonds

所以在代码片段中我们使用find方法定位INGREDIENTS单词和(符号（假设它总是在主要成分之后，指定百分比它）。

然后我们使用 string 对上述索引进行切片并打印结果，使用 f-string.

将其格式化为所需的输出

Answer 2

使用正则表达式查找所有匹配项：

import re

txt = "INGREDIENTS: Ground Almonds(\"100\");"
x = re.findall("INGREDIENTS:\s(\w+)\s(\w+)", txt)
print(x)

# [('Ground', 'Almonds')]

Answer 3

如果你不关心百分比，想避免regex:

string = 'INGREDIENTS: Ground Almonds(100%).'

tokens = string.split()
for n,i in enumerate(tokens):
    if 'INGREDIENTS' in i:
        print(' '.join(tokens[n:n+3]))

输出：

INGREDIENTS: Ground Almonds(100%).

如何查找特定文本并在其后打印接下来的 2 个单词

How to find specific text & print the next 2 words after it

python

tesseract