打印 (encoding/edcoding) 法语字符在 txt 文件中有效,但在 excel/csv [python] 中不正确
print (encoding/edcoding) French characters works in txt file but incorrect in excel/csv [python]
我得到了一个带有特殊字符的字符串(应该是法语字符),我想让它在 csv/excel 中正确显示:
s1 = 'Benoît'
# take a look at encoding
print(s1.encode(encoding='utf-8'))
# print to txt
with open("firstname.txt", "w") as text_file:
print(s1, file=text_file)
# print to csv
import pandas as pd
df = pd.DataFrame({'FirstName': [s1]})
df.to_csv('firstname.csv', index = False)
结果 txt 文件正确显示法语,但 csv 文件不正确。
我的问题是如何让csv正确显示? (我可以将法语字符从 txt 复制到 csv,但是如何以编程方式编写 csv 并正确显示它?)
更新:
感谢@snakecharmerb,我尝试了 encoding = 'utf-8-sig'
# try csv with encoding = 'utf-8-sig': doesn't work
df = pd.DataFrame({'a': [s1]})
df.to_csv('firstname.csv', index = False, encoding = 'utf-8-sig')
# read from txt file which seems to display correctly
df = pd.read_table("firstname.txt", header = None)
df
# 0
# 0 Benoît
# then write to csv with encoding = 'utf-8-sig' - works
df.to_csv('firstname1.csv', index = False, encoding = 'utf-8-sig')
Excel 不一定能识别出文件编码为 UTF-8。您可以在 Excel 中打开文件时指定 UTF-8 作为编码,或者您可以使用 'utf-8-sig' 编码编写 csv 文件。
'utf-8-sig' 是 Excel 的 Windows 特定版本,它插入三个字符 "byte order mark" (BOM) 和文件的开头。 Windows 试图猜测文件编码的应用程序将读取 BOM 并从 UTF-8 解码文件。 BOM在其他平台可能无法识别,导致文件开头出现三个异常字符。
我得到了一个带有特殊字符的字符串(应该是法语字符),我想让它在 csv/excel 中正确显示:
s1 = 'Benoît'
# take a look at encoding
print(s1.encode(encoding='utf-8'))
# print to txt
with open("firstname.txt", "w") as text_file:
print(s1, file=text_file)
# print to csv
import pandas as pd
df = pd.DataFrame({'FirstName': [s1]})
df.to_csv('firstname.csv', index = False)
结果 txt 文件正确显示法语,但 csv 文件不正确。
我的问题是如何让csv正确显示? (我可以将法语字符从 txt 复制到 csv,但是如何以编程方式编写 csv 并正确显示它?)
更新:
感谢@snakecharmerb,我尝试了 encoding = 'utf-8-sig'
# try csv with encoding = 'utf-8-sig': doesn't work
df = pd.DataFrame({'a': [s1]})
df.to_csv('firstname.csv', index = False, encoding = 'utf-8-sig')
# read from txt file which seems to display correctly
df = pd.read_table("firstname.txt", header = None)
df
# 0
# 0 Benoît
# then write to csv with encoding = 'utf-8-sig' - works
df.to_csv('firstname1.csv', index = False, encoding = 'utf-8-sig')
Excel 不一定能识别出文件编码为 UTF-8。您可以在 Excel 中打开文件时指定 UTF-8 作为编码,或者您可以使用 'utf-8-sig' 编码编写 csv 文件。
'utf-8-sig' 是 Excel 的 Windows 特定版本,它插入三个字符 "byte order mark" (BOM) 和文件的开头。 Windows 试图猜测文件编码的应用程序将读取 BOM 并从 UTF-8 解码文件。 BOM在其他平台可能无法识别,导致文件开头出现三个异常字符。