将 Unicode 写入 .docx 文件
Writing Unicode to .docx file
有什么方法可以将unicode字符写入docx文件吗?我试过 python-docx 但是,它给了我 TypeError
.
回溯
Traceback (most recent call last):
File "< ipython-input-1-ba89c735995d >", line 1, in
runfile('H:/Python/Practice/new/download.py', wdir='H:/Python/Practice/new')
File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile
execfile(filename, namespace)
File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "H:/Python/Practice/new/download.py", line 37, in
document.add_paragraph(story.encode("utf-8"))
File "C:\Program Files\Anaconda3\lib\site-packages\docx\document.py", line 63, in add_paragraph
return self._body.add_paragraph(text, style)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\blkcntnr.py", line 36, in add_paragraph
paragraph.add_run(text)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\text\paragraph.py", line 37, in add_run
run.text = text
File "C:\Program Files\Anaconda3\lib\site-packages\docx\text\run.py", line 163, in text
self._r.text = text
File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 104, in text
_RunContentAppender.append_to_run_from_text(self, text)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 134, in append_to_run_from_text
appender.add_text(text)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 142, in add_text
self.add_char(char)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 156, in add_char
elif char in '\r\n':
TypeError: 'in < string >' requires string as left operand, not int
我试图抓取一个网站并将文本写入 MS Word 文件。文本使用当地语言(孟加拉语)。当我在控制台上打印 story
时,它会完美地打印出整个文本。
代码
import requests
from docx import Document
from bs4 import BeautifulSoup
url = "some url"
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")
story = soup.find("div", {"dir": "ltr"}).get_text().replace("<br />", "\n", len(response.text))
title = "title"
document = Document()
document.add_heading(title, 0)
document.add_paragraph(story.encode("utf-8"))
document.save(title + ".docx")
在 Python 3 中,str
ings(应该)自动支持 Unicode,所以您不需要做任何特殊的事情,比如编码为 UTF-8。
document.add_paragraph(story)
# ^ just do this directly, no need to call `.encode('utf-8')`
虽然您可以将 Unicode 字符写入文档,但文字处理器使用的默认文本呈现引擎可能不支持它。您可能需要specify a font确保字符不会变成□□□□。
run = document.add_paragraph(story).add_run()
font = run.font
font.name = 'Vrinda'
有什么方法可以将unicode字符写入docx文件吗?我试过 python-docx 但是,它给了我 TypeError
.
回溯
Traceback (most recent call last):
File "< ipython-input-1-ba89c735995d >", line 1, in runfile('H:/Python/Practice/new/download.py', wdir='H:/Python/Practice/new')
File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 866, in runfile execfile(filename, namespace)
File "C:\Program Files\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile exec(compile(f.read(), filename, 'exec'), namespace)
File "H:/Python/Practice/new/download.py", line 37, in document.add_paragraph(story.encode("utf-8"))
File "C:\Program Files\Anaconda3\lib\site-packages\docx\document.py", line 63, in add_paragraph return self._body.add_paragraph(text, style)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\blkcntnr.py", line 36, in add_paragraph paragraph.add_run(text)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\text\paragraph.py", line 37, in add_run run.text = text
File "C:\Program Files\Anaconda3\lib\site-packages\docx\text\run.py", line 163, in text self._r.text = text
File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 104, in text _RunContentAppender.append_to_run_from_text(self, text)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 134, in append_to_run_from_text appender.add_text(text)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 142, in add_text self.add_char(char)
File "C:\Program Files\Anaconda3\lib\site-packages\docx\oxml\text\run.py", line 156, in add_char elif char in '\r\n':
TypeError: 'in < string >' requires string as left operand, not int
我试图抓取一个网站并将文本写入 MS Word 文件。文本使用当地语言(孟加拉语)。当我在控制台上打印 story
时,它会完美地打印出整个文本。
代码
import requests
from docx import Document
from bs4 import BeautifulSoup
url = "some url"
response = requests.get(url)
soup = BeautifulSoup(response.text, "lxml")
story = soup.find("div", {"dir": "ltr"}).get_text().replace("<br />", "\n", len(response.text))
title = "title"
document = Document()
document.add_heading(title, 0)
document.add_paragraph(story.encode("utf-8"))
document.save(title + ".docx")
在 Python 3 中,str
ings(应该)自动支持 Unicode,所以您不需要做任何特殊的事情,比如编码为 UTF-8。
document.add_paragraph(story)
# ^ just do this directly, no need to call `.encode('utf-8')`
虽然您可以将 Unicode 字符写入文档,但文字处理器使用的默认文本呈现引擎可能不支持它。您可能需要specify a font确保字符不会变成□□□□。
run = document.add_paragraph(story).add_run()
font = run.font
font.name = 'Vrinda'