写入文本文件时不接受某些字符 python

Not accepting certain characters when writing to text file python

在我的函数结束时,我将结果写入一个文本文件,该文件是因为它不存在而创建的,如下所示:

new_file = charity + ".txt"
with open(new_file, "w") as handle:
    handle.write("Matches found for " + charity.upper() + " in order of compatibility:\n")
    for item in match_lst:
            handle.write("Grant: " + item[2] + ". Funding offered: " + int_to_str(item[1]))
            handle.write("Number of matches: " + str(item[0] - 1) + "\n")
    handle.close()

我的问题是,当它写入新文件时,它似乎不识别换行符、'£' 字符和撇号字符。为了展示我在说什么,这里是输出文件的摘录:

Matches found for BLA in order of compatibility:
Grant: The Taylor Family Foundation. Funding offered: �500,000.00Number of matches: 1
Grant: The Peter Cruddas Foundation. Funding offered: �200,000.00Number of matches: 1
Grant: The London Marathon Charitable Trust Limited - Major Capital Project 
Grants. Funding offered: �150,000.00Number of matches: 1
Grant: The Hadley Trust. Funding offered: �100,000.00Number of matches: 1
Grant: The Company Of Actuaries� Charitable Trust Fund. Funding offered: �65,000.00Number of matches: 1
Grant: The William Wates Memorial Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Nomura Charitable Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Grocers� Charity. Funding offered: �40,000.00Number of matches: 1

作为参考,这里是我试图在其原始数据结构中写入的信息(即match_lst)

[(2, 500000.0, 'The Taylor Family Foundation', ['Young People', 'Arts Or Heritage', 'Social Reserarch'], ['Registered Charity']), 
(2, 200000.0, 'The Peter Cruddas Foundation', ['Young People'], ['Registered Charity', 'Other']),
(2, 150000.0, 'The London Marathon Charitable Trust Limited - Major Capital Project Grants', ['Infrastructure Support', 'Sport And Recreational Activities'], ['Registered Charity', 'Limited Company', 'Other']), 
(2, 100000.0, 'The Hadley Trust', ['Social Relief And Care', 'Crime And Victimisation', 'Young People', 'Social Reserarch'], ['Registered Charity', 'Limited Company']), 
(2, 65000.0, 'The Company Of Actuaries’ Charitable Trust Fund', ['Young People', 'Disabilities', 'Social Relief And Care', 'Medical Research'], ['Registered Charity']), 
(2, 50000.0, 'The William Wates Memorial Trust', ['Young People', 'Arts Or Heritage', 'Sport And Recreational Activities'], ['Registered Charity', 'Other']), 
(2, 50000.0, 'The Nomura Charitable Trust', ['Young People', 'Education And Learning', 'Unemployment'], ['Registered Charity']), 
(2, 40000.0, 'The Grocers’ Charity', ['Poverty', 'Young People', 'Disabilities', 'Healthcare Sector', 'Arts Or Heritage'], ['Registered Charity']) ]

如你所见,这里所有的字符都打印得很好。

有关更多上下文,这是我的简单 int_to_str 函数:

def int_to_str(num_int):
if num_int == 0:
    return "Discretionary"

else:
    return '£' + '{:,.2f}'.format(num_int)

所以我的问题是如何解决此问题以打印所有 missing/encoded?

的字符

没有细节很难猜。无论如何,这确实是一个字符集问题。让我们看看一些无法正确显示的字符:

  • 换行符 - 它依赖于 OS:它在类 Unix 系统上是 \n,在 Windows 上是 \r\n(2 个字符) .
  • '£' 或井号。它是 Unicode 字符 U+00A3。在 Windows 代码页 1252 或 Latin1 (ISO-8859-1) 中,它是一个单字节 b'\xa3',而在 utf8 中,它被编码为 b'\xc2\xa3'。更有趣的是,如果您尝试以 UTF-8 显示 b'\xa3',您将得到替换字符 U+FFFD,它显示为 '�'.
  • 撇号字符。 true APOSTROPHE ("'") 是 ASCII 字符 U+0027。这里没问题。但它可以被一些启用 unicode 的编辑器悄悄地替换为正确的引号(U+2019 或 "’"。只是它不存在于 Windows 1252 代码页或 Latin1...

所有这些只是意味着细节很重要。如果不知道您是如何从二进制文件中读取数据的,也不知道它是如何构建的,就不可能解释实际发生了什么。文本文件是一种抽象。实际文本文件是具有给定编码和行尾约定的字节序列。

似乎每一行都被写入一个新行,因为字符串似乎不是一个连续的文本,输出中的 \n 字符只是隐藏了。要解决编码问题,您必须在文件打开命令中指定编码:

with open(new_file, 'w', encoding="utf-8") as handle:
    ...

我将 post 这作为对未来访问者的回答。

谢谢