从 python 中的字符串中删除多余的 \n
Removing extra \n from a string in python
最近在python学习文字识别。将图像转换为字符串时,它会在我的图像中随机输出一个额外的换行符。我试过删除它,但似乎找不到方法。我的目标是将选择分成相应的字符串
这是我的代码和图像:
choices = cv2.imread("ROI_0.png", 0)
custom_config = r'--oem 3 --psm 6'
c = pytesseract.image_to_string(choices, config=custom_config, lang='eng')
print(c.rstrip("\n")) # my attempt
text = repr(c)
print(text)
newtext = text.split("\n")
print(newtext)
这是输出:
a. E. 0. 125
b. R. A. 3846
c. R. A. 3396
d. R. A. 7925
'a. E. 0. 125\n\nb. R. A. 3846\nc. R. A. 3396\nd. R. A. 7925'
["'a. E. 0. 125", '', 'b. R. A. 3846', 'c. R. A. 3396', "d. R. A. 7925'"]
您可以做的是将多个新行删除为一个新行:
import re
x = re.sub(r'\n{2, 10}', '\n', c) # \n is new line, {2,10} is the range of occurrences of the newline that I'm searching for.
所以它会像:
choices = cv2.imread("ROI_0.png", 0)
custom_config = r'--oem 3 --psm 6'
c = pytesseract.image_to_string(choices, config=custom_config, lang='eng')
x = re.sub(r'\n{2, 10}', '\n', c)
print(x.rstrip("\n"))
最近在python学习文字识别。将图像转换为字符串时,它会在我的图像中随机输出一个额外的换行符。我试过删除它,但似乎找不到方法。我的目标是将选择分成相应的字符串
这是我的代码和图像:
choices = cv2.imread("ROI_0.png", 0)
custom_config = r'--oem 3 --psm 6'
c = pytesseract.image_to_string(choices, config=custom_config, lang='eng')
print(c.rstrip("\n")) # my attempt
text = repr(c)
print(text)
newtext = text.split("\n")
print(newtext)
这是输出:
a. E. 0. 125
b. R. A. 3846
c. R. A. 3396
d. R. A. 7925
'a. E. 0. 125\n\nb. R. A. 3846\nc. R. A. 3396\nd. R. A. 7925'
["'a. E. 0. 125", '', 'b. R. A. 3846', 'c. R. A. 3396', "d. R. A. 7925'"]
您可以做的是将多个新行删除为一个新行:
import re
x = re.sub(r'\n{2, 10}', '\n', c) # \n is new line, {2,10} is the range of occurrences of the newline that I'm searching for.
所以它会像:
choices = cv2.imread("ROI_0.png", 0)
custom_config = r'--oem 3 --psm 6'
c = pytesseract.image_to_string(choices, config=custom_config, lang='eng')
x = re.sub(r'\n{2, 10}', '\n', c)
print(x.rstrip("\n"))