写入文本文件时不接受某些字符 python

Question

在我的函数结束时，我将结果写入一个文本文件，该文件是因为它不存在而创建的，如下所示：

new_file = charity + ".txt"
with open(new_file, "w") as handle:
    handle.write("Matches found for " + charity.upper() + " in order of compatibility:\n")
    for item in match_lst:
            handle.write("Grant: " + item[2] + ". Funding offered: " + int_to_str(item[1]))
            handle.write("Number of matches: " + str(item[0] - 1) + "\n")
    handle.close()

我的问题是，当它写入新文件时，它似乎不识别换行符、'£' 字符和撇号字符。为了展示我在说什么，这里是输出文件的摘录：

Matches found for BLA in order of compatibility:
Grant: The Taylor Family Foundation. Funding offered: �500,000.00Number of matches: 1
Grant: The Peter Cruddas Foundation. Funding offered: �200,000.00Number of matches: 1
Grant: The London Marathon Charitable Trust Limited - Major Capital Project 
Grants. Funding offered: �150,000.00Number of matches: 1
Grant: The Hadley Trust. Funding offered: �100,000.00Number of matches: 1
Grant: The Company Of Actuaries� Charitable Trust Fund. Funding offered: �65,000.00Number of matches: 1
Grant: The William Wates Memorial Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Nomura Charitable Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Grocers� Charity. Funding offered: �40,000.00Number of matches: 1

作为参考，这里是我试图在其原始数据结构中写入的信息（即match_lst）

[(2, 500000.0, 'The Taylor Family Foundation', ['Young People', 'Arts Or Heritage', 'Social Reserarch'], ['Registered Charity']), 
(2, 200000.0, 'The Peter Cruddas Foundation', ['Young People'], ['Registered Charity', 'Other']),
(2, 150000.0, 'The London Marathon Charitable Trust Limited - Major Capital Project Grants', ['Infrastructure Support', 'Sport And Recreational Activities'], ['Registered Charity', 'Limited Company', 'Other']), 
(2, 100000.0, 'The Hadley Trust', ['Social Relief And Care', 'Crime And Victimisation', 'Young People', 'Social Reserarch'], ['Registered Charity', 'Limited Company']), 
(2, 65000.0, 'The Company Of Actuaries’ Charitable Trust Fund', ['Young People', 'Disabilities', 'Social Relief And Care', 'Medical Research'], ['Registered Charity']), 
(2, 50000.0, 'The William Wates Memorial Trust', ['Young People', 'Arts Or Heritage', 'Sport And Recreational Activities'], ['Registered Charity', 'Other']), 
(2, 50000.0, 'The Nomura Charitable Trust', ['Young People', 'Education And Learning', 'Unemployment'], ['Registered Charity']), 
(2, 40000.0, 'The Grocers’ Charity', ['Poverty', 'Young People', 'Disabilities', 'Healthcare Sector', 'Arts Or Heritage'], ['Registered Charity']) ]

如你所见，这里所有的字符都打印得很好。

有关更多上下文，这是我的简单 int_to_str 函数：

def int_to_str(num_int):
if num_int == 0:
    return "Discretionary"

else:
    return '£' + '{:,.2f}'.format(num_int)

所以我的问题是如何解决此问题以打印所有 missing/encoded?

的字符

Answer 1

没有细节很难猜。无论如何，这确实是一个字符集问题。让我们看看一些无法正确显示的字符：

换行符 - 它依赖于 OS：它在类 Unix 系统上是 \n，在 Windows 上是 \r\n（2 个字符） .
'£' 或井号。它是 Unicode 字符 U+00A3。在 Windows 代码页 1252 或 Latin1 (ISO-8859-1) 中，它是一个单字节 b'\xa3'，而在 utf8 中，它被编码为 b'\xc2\xa3'。更有趣的是，如果您尝试以 UTF-8 显示 b'\xa3'，您将得到替换字符 U+FFFD，它显示为 '�'.
撇号字符。 true APOSTROPHE ("'") 是 ASCII 字符 U+0027。这里没问题。但它可以被一些启用 unicode 的编辑器悄悄地替换为正确的引号（U+2019 或 "’"。只是它不存在于 Windows 1252 代码页或 Latin1...

所有这些只是意味着细节很重要。如果不知道您是如何从二进制文件中读取数据的，也不知道它是如何构建的，就不可能解释实际发生了什么。文本文件是一种抽象。实际文本文件是具有给定编码和行尾约定的字节序列。

Answer 2

似乎每一行都被写入一个新行，因为字符串似乎不是一个连续的文本，输出中的 \n 字符只是隐藏了。要解决编码问题，您必须在文件打开命令中指定编码：

with open(new_file, 'w', encoding="utf-8") as handle:
    ...

我将 post 这作为对未来访问者的回答。

谢谢

写入文本文件时不接受某些字符 python

Not accepting certain characters when writing to text file python

python

io

file-writing