以 UTF-8 格式写入 txt 文件 - Python
Writing to txt file in UTF-8 - Python
我的 Django 应用程序从用户那里获取文档,创建一些关于它的报告,然后写入 txt
文件。有趣的问题是在我的 Mac OS 上一切正常。但是在Windows上,它无法读取某些字母,将其转换为é™
、ä±
等符号。这是我的代码:
views.py
:
def result(request):
last_uploaded = OriginalDocument.objects.latest('id')
original = open(str(last_uploaded.document), 'r')
original_words = original.read().lower().split()
words_count = len(original_words)
open_original = open(str(last_uploaded.document), "r")
read_original = open_original.read()
characters_count = len(read_original)
report_fives = open("static/report_documents/" + str(last_uploaded.student_name) +
"-" + str(last_uploaded.document_title) + "-5.txt", 'w', encoding="utf-8")
# Path to the documents with which original doc is comparing
path = 'static/other_documents/doc*.txt'
files = glob.glob(path)
#endregion
rows, found_count, fives_count, rounded_percentage_five, percentage_for_chart_five, fives_for_report, founded_docs_for_report = search_by_five(last_uploaded, 5, original_words, report_fives, files)
context = {
...
}
return render(request, 'result.html', context)
report txt file
:
['universitetindé™', 'té™hsili', 'alä±ram.', 'mé™n'] was found in static/other_documents\doc1.txt.
...
这里的问题是您在未指定编码的情况下对文件调用 open()
。如 the Python documentation 中所述,默认编码取决于平台。这可能就是您在 Windows 和 MacOS 中看到不同结果的原因。
假设文件本身实际上是用 UTF-8 编码的,只需在读取文件时指定:
original = open(str(last_uploaded.document), 'r', encoding="utf-8")
我的 Django 应用程序从用户那里获取文档,创建一些关于它的报告,然后写入 txt
文件。有趣的问题是在我的 Mac OS 上一切正常。但是在Windows上,它无法读取某些字母,将其转换为é™
、ä±
等符号。这是我的代码:
views.py
:
def result(request):
last_uploaded = OriginalDocument.objects.latest('id')
original = open(str(last_uploaded.document), 'r')
original_words = original.read().lower().split()
words_count = len(original_words)
open_original = open(str(last_uploaded.document), "r")
read_original = open_original.read()
characters_count = len(read_original)
report_fives = open("static/report_documents/" + str(last_uploaded.student_name) +
"-" + str(last_uploaded.document_title) + "-5.txt", 'w', encoding="utf-8")
# Path to the documents with which original doc is comparing
path = 'static/other_documents/doc*.txt'
files = glob.glob(path)
#endregion
rows, found_count, fives_count, rounded_percentage_five, percentage_for_chart_five, fives_for_report, founded_docs_for_report = search_by_five(last_uploaded, 5, original_words, report_fives, files)
context = {
...
}
return render(request, 'result.html', context)
report txt file
:
['universitetindé™', 'té™hsili', 'alä±ram.', 'mé™n'] was found in static/other_documents\doc1.txt.
...
这里的问题是您在未指定编码的情况下对文件调用 open()
。如 the Python documentation 中所述,默认编码取决于平台。这可能就是您在 Windows 和 MacOS 中看到不同结果的原因。
假设文件本身实际上是用 UTF-8 编码的,只需在读取文件时指定:
original = open(str(last_uploaded.document), 'r', encoding="utf-8")