写入文本文件时不接受某些字符 python
Not accepting certain characters when writing to text file python
在我的函数结束时,我将结果写入一个文本文件,该文件是因为它不存在而创建的,如下所示:
new_file = charity + ".txt"
with open(new_file, "w") as handle:
handle.write("Matches found for " + charity.upper() + " in order of compatibility:\n")
for item in match_lst:
handle.write("Grant: " + item[2] + ". Funding offered: " + int_to_str(item[1]))
handle.write("Number of matches: " + str(item[0] - 1) + "\n")
handle.close()
我的问题是,当它写入新文件时,它似乎不识别换行符、'£' 字符和撇号字符。为了展示我在说什么,这里是输出文件的摘录:
Matches found for BLA in order of compatibility:
Grant: The Taylor Family Foundation. Funding offered: �500,000.00Number of matches: 1
Grant: The Peter Cruddas Foundation. Funding offered: �200,000.00Number of matches: 1
Grant: The London Marathon Charitable Trust Limited - Major Capital Project
Grants. Funding offered: �150,000.00Number of matches: 1
Grant: The Hadley Trust. Funding offered: �100,000.00Number of matches: 1
Grant: The Company Of Actuaries� Charitable Trust Fund. Funding offered: �65,000.00Number of matches: 1
Grant: The William Wates Memorial Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Nomura Charitable Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Grocers� Charity. Funding offered: �40,000.00Number of matches: 1
作为参考,这里是我试图在其原始数据结构中写入的信息(即match_lst)
[(2, 500000.0, 'The Taylor Family Foundation', ['Young People', 'Arts Or Heritage', 'Social Reserarch'], ['Registered Charity']),
(2, 200000.0, 'The Peter Cruddas Foundation', ['Young People'], ['Registered Charity', 'Other']),
(2, 150000.0, 'The London Marathon Charitable Trust Limited - Major Capital Project Grants', ['Infrastructure Support', 'Sport And Recreational Activities'], ['Registered Charity', 'Limited Company', 'Other']),
(2, 100000.0, 'The Hadley Trust', ['Social Relief And Care', 'Crime And Victimisation', 'Young People', 'Social Reserarch'], ['Registered Charity', 'Limited Company']),
(2, 65000.0, 'The Company Of Actuaries’ Charitable Trust Fund', ['Young People', 'Disabilities', 'Social Relief And Care', 'Medical Research'], ['Registered Charity']),
(2, 50000.0, 'The William Wates Memorial Trust', ['Young People', 'Arts Or Heritage', 'Sport And Recreational Activities'], ['Registered Charity', 'Other']),
(2, 50000.0, 'The Nomura Charitable Trust', ['Young People', 'Education And Learning', 'Unemployment'], ['Registered Charity']),
(2, 40000.0, 'The Grocers’ Charity', ['Poverty', 'Young People', 'Disabilities', 'Healthcare Sector', 'Arts Or Heritage'], ['Registered Charity']) ]
如你所见,这里所有的字符都打印得很好。
有关更多上下文,这是我的简单 int_to_str 函数:
def int_to_str(num_int):
if num_int == 0:
return "Discretionary"
else:
return '£' + '{:,.2f}'.format(num_int)
所以我的问题是如何解决此问题以打印所有 missing/encoded?
的字符
没有细节很难猜。无论如何,这确实是一个字符集问题。让我们看看一些无法正确显示的字符:
- 换行符 - 它依赖于 OS:它在类 Unix 系统上是
\n
,在 Windows 上是 \r\n
(2 个字符) .
'£'
或井号。它是 Unicode 字符 U+00A3。在 Windows 代码页 1252 或 Latin1 (ISO-8859-1) 中,它是一个单字节 b'\xa3'
,而在 utf8 中,它被编码为 b'\xc2\xa3'
。更有趣的是,如果您尝试以 UTF-8 显示 b'\xa3'
,您将得到替换字符 U+FFFD,它显示为 '�'
.
- 撇号字符。 true APOSTROPHE (
"'"
) 是 ASCII 字符 U+0027。这里没问题。但它可以被一些启用 unicode 的编辑器悄悄地替换为正确的引号(U+2019 或 "’"
。只是它不存在于 Windows 1252 代码页或 Latin1...
所有这些只是意味着细节很重要。如果不知道您是如何从二进制文件中读取数据的,也不知道它是如何构建的,就不可能解释实际发生了什么。文本文件是一种抽象。实际文本文件是具有给定编码和行尾约定的字节序列。
似乎每一行都被写入一个新行,因为字符串似乎不是一个连续的文本,输出中的 \n
字符只是隐藏了。要解决编码问题,您必须在文件打开命令中指定编码:
with open(new_file, 'w', encoding="utf-8") as handle:
...
我将 post 这作为对未来访问者的回答。
谢谢
在我的函数结束时,我将结果写入一个文本文件,该文件是因为它不存在而创建的,如下所示:
new_file = charity + ".txt"
with open(new_file, "w") as handle:
handle.write("Matches found for " + charity.upper() + " in order of compatibility:\n")
for item in match_lst:
handle.write("Grant: " + item[2] + ". Funding offered: " + int_to_str(item[1]))
handle.write("Number of matches: " + str(item[0] - 1) + "\n")
handle.close()
我的问题是,当它写入新文件时,它似乎不识别换行符、'£' 字符和撇号字符。为了展示我在说什么,这里是输出文件的摘录:
Matches found for BLA in order of compatibility:
Grant: The Taylor Family Foundation. Funding offered: �500,000.00Number of matches: 1
Grant: The Peter Cruddas Foundation. Funding offered: �200,000.00Number of matches: 1
Grant: The London Marathon Charitable Trust Limited - Major Capital Project
Grants. Funding offered: �150,000.00Number of matches: 1
Grant: The Hadley Trust. Funding offered: �100,000.00Number of matches: 1
Grant: The Company Of Actuaries� Charitable Trust Fund. Funding offered: �65,000.00Number of matches: 1
Grant: The William Wates Memorial Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Nomura Charitable Trust. Funding offered: �50,000.00Number of matches: 1
Grant: The Grocers� Charity. Funding offered: �40,000.00Number of matches: 1
作为参考,这里是我试图在其原始数据结构中写入的信息(即match_lst)
[(2, 500000.0, 'The Taylor Family Foundation', ['Young People', 'Arts Or Heritage', 'Social Reserarch'], ['Registered Charity']),
(2, 200000.0, 'The Peter Cruddas Foundation', ['Young People'], ['Registered Charity', 'Other']),
(2, 150000.0, 'The London Marathon Charitable Trust Limited - Major Capital Project Grants', ['Infrastructure Support', 'Sport And Recreational Activities'], ['Registered Charity', 'Limited Company', 'Other']),
(2, 100000.0, 'The Hadley Trust', ['Social Relief And Care', 'Crime And Victimisation', 'Young People', 'Social Reserarch'], ['Registered Charity', 'Limited Company']),
(2, 65000.0, 'The Company Of Actuaries’ Charitable Trust Fund', ['Young People', 'Disabilities', 'Social Relief And Care', 'Medical Research'], ['Registered Charity']),
(2, 50000.0, 'The William Wates Memorial Trust', ['Young People', 'Arts Or Heritage', 'Sport And Recreational Activities'], ['Registered Charity', 'Other']),
(2, 50000.0, 'The Nomura Charitable Trust', ['Young People', 'Education And Learning', 'Unemployment'], ['Registered Charity']),
(2, 40000.0, 'The Grocers’ Charity', ['Poverty', 'Young People', 'Disabilities', 'Healthcare Sector', 'Arts Or Heritage'], ['Registered Charity']) ]
如你所见,这里所有的字符都打印得很好。
有关更多上下文,这是我的简单 int_to_str 函数:
def int_to_str(num_int):
if num_int == 0:
return "Discretionary"
else:
return '£' + '{:,.2f}'.format(num_int)
所以我的问题是如何解决此问题以打印所有 missing/encoded?
的字符没有细节很难猜。无论如何,这确实是一个字符集问题。让我们看看一些无法正确显示的字符:
- 换行符 - 它依赖于 OS:它在类 Unix 系统上是
\n
,在 Windows 上是\r\n
(2 个字符) . '£'
或井号。它是 Unicode 字符 U+00A3。在 Windows 代码页 1252 或 Latin1 (ISO-8859-1) 中,它是一个单字节b'\xa3'
,而在 utf8 中,它被编码为b'\xc2\xa3'
。更有趣的是,如果您尝试以 UTF-8 显示b'\xa3'
,您将得到替换字符 U+FFFD,它显示为'�'
.- 撇号字符。 true APOSTROPHE (
"'"
) 是 ASCII 字符 U+0027。这里没问题。但它可以被一些启用 unicode 的编辑器悄悄地替换为正确的引号(U+2019 或"’"
。只是它不存在于 Windows 1252 代码页或 Latin1...
所有这些只是意味着细节很重要。如果不知道您是如何从二进制文件中读取数据的,也不知道它是如何构建的,就不可能解释实际发生了什么。文本文件是一种抽象。实际文本文件是具有给定编码和行尾约定的字节序列。
似乎每一行都被写入一个新行,因为字符串似乎不是一个连续的文本,输出中的 \n
字符只是隐藏了。要解决编码问题,您必须在文件打开命令中指定编码:
with open(new_file, 'w', encoding="utf-8") as handle:
...
我将 post 这作为对未来访问者的回答。
谢谢