在 python 中将文本文件转换为 tiff 文件

Question

我正在使用以下代码将文本文件转换为 tiff，但当文本文件内容以特殊字符开头时它不起作用。我不知道为什么它不起作用。你能请任何人帮我完成这个任务吗

def main():
    image = text_image('/Users/administrator/Desktop/367062657_1.text')
    image.show()
    image.save('contentok.tiff')

def text_image(text_path, font_path=None):

    grayscale = 'L'
    # parse the file into lines
    with open(text_path) as text_file:
        lines = tuple(l.rstrip() for l in text_file.readlines())

    large_font = 20
    font_path = font_path or 'cour.ttf'  
    try:
        font = PIL.ImageFont.truetype(font_path, size=large_font)
    except IOError:
        font = PIL.ImageFont.load_default()
        print('Could not use chosen font. Using default.')
    pt2px = lambda pt: int(round(pt * 96.0 / 72))
    max_width_line = max(lines, key=lambda s: font.getsize(s)[0])
    test_string = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    max_height = pt2px(font.getsize(test_string)[1])
    max_width = pt2px(font.getsize(max_width_line)[0])
    height = max_height * len(lines) # perfect or a little oversized
    width = int(round(max_width + 5))  # a little oversized

    image = PIL.Image.new(grayscale, (width, height), color=PIXEL_OFF)
    draw = PIL.ImageDraw.Draw(image)
    vertical_position = 5
    horizontal_position = 5
    line_spacing = int(round(max_height * 1.0))
    for line in lines:
        draw.text((horizontal_position, vertical_position),
                  line, fill=PIXEL_ON, font=font)

        vertical_position += line_spacing
    c_box = PIL.ImageOps.invert(image).getbbox()
    image = image.crop(c_box)`enter code here`
    return image

错误：

Error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf2 in position 18: invalid continuation byte

Answer 1

你的问题在这里：

with open(text_path) as text_file:
    lines = tuple(l.rstrip() for l in text_file.readlines())

根据您在评论中提到的错误，当其中的数据与 UTF-8 不兼容时，您将文本文件加载为文本（默认为 UTF-8）。

您应该使用与数据匹配的指定编码打开文件。参见 docs here

基本上像这样的东西应该可以工作：

with open(text_path, encoding='windows-1255') as text_file:
    lines = tuple(l.rstrip() for l in text_file.readlines())

当然 windows-1255 只是我的猜测...您应该知道您的文件是如何编码的，take a look here 用于可用值列表

Answer 2

我知道这是一个旧线程。但是对于像我这样的未来读者:)

我在一个项目中遇到了这些解码错误困扰着我。如果您在 unix-like 环境中，可以使用一个有用的工具来检查文件的编码：它称为“文件”

用法：文件文件名
显示编码和线路终止

示例： 文件Z0117669.ldt

Z0117669.ldt：ISO-8859 文本，带有 CRLF 行终止符

在 python 中将文本文件转换为 tiff 文件

Convert text file to tiff file in python

python

django

tiff

python-3.x