如何读取 .rtf 文件并转换成 python3 字符串并存储在 python3 列表中?
How to read .rtf file and convert into python3 strings and can be stored in python3 list?
我有一个 .rtf 文件,我想通过使用任何包使用 python3 读取文件并将字符串存储到列表中,但它应该与 Windows 和 [=28] 兼容=].
我试过 striprtf 但 read_rtf 不工作。
from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)
但是在这段代码中,错误是:cannot import name 'read_rtf'
任何人都可以建议任何从 python3 中的 .rtf 文件获取字符串的方法吗?
你试过吗?
with open('yourfile.rtf', 'r') as file:
text = file.read()
print(text)
对于超大文件,试试这个:
with open("yourfile.rtf") as infile:
for line in infile:
do_something_with(line)
试试这个:
from striprtf.striprtf import rtf_to_text
sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)
读取 RTF 文件并处理其中的数据是很棘手的,这取决于您拥有的文件,因此我尝试了以上所有方法都没有用,最后,以下代码对我有用。希望对正在寻找解决方案的小伙伴有所帮助。
from win32com.client import Dispatch
word = Dispatch('Word.Application') # Open word application
# word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
path = r'C:\Projects.1\power.rtf'
doc = word.Documents.Open(FileName=path, Encoding='gbk')
for para in doc.paragraphs:
print(para.Range.Text)
doc.Close()
word.Quit()
如果要存储在单个变量中,下面的代码就可以解决问题。
from win32com.client import Dispatch
word = Dispatch('Word.Application') # Open word application
# word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
path = r'C:\Projects.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')
#for para in doc.paragraphs:
# print(para.Range.Text)
content = '\n'.join([para.Range.Text for para in doc.paragraphs])
print(content)
doc.Close()
word.Quit()
使用rtf_to_text
足以将RTFinto
转换为Python中的字符串。
从 RTF 文件中读取内容,然后将其提供给 rtf_to_text
:
from striprtf.striprtf import rtf_to_text
with open("yourfile.rtf") as infile:
content = infile.read()
text = rtf_to_text(content)
print(text)
我有一个 .rtf 文件,我想通过使用任何包使用 python3 读取文件并将字符串存储到列表中,但它应该与 Windows 和 [=28] 兼容=].
我试过 striprtf 但 read_rtf 不工作。
from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)
但是在这段代码中,错误是:cannot import name 'read_rtf'
任何人都可以建议任何从 python3 中的 .rtf 文件获取字符串的方法吗?
你试过吗?
with open('yourfile.rtf', 'r') as file:
text = file.read()
print(text)
对于超大文件,试试这个:
with open("yourfile.rtf") as infile:
for line in infile:
do_something_with(line)
试试这个:
from striprtf.striprtf import rtf_to_text
sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)
读取 RTF 文件并处理其中的数据是很棘手的,这取决于您拥有的文件,因此我尝试了以上所有方法都没有用,最后,以下代码对我有用。希望对正在寻找解决方案的小伙伴有所帮助。
from win32com.client import Dispatch
word = Dispatch('Word.Application') # Open word application
# word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
path = r'C:\Projects.1\power.rtf'
doc = word.Documents.Open(FileName=path, Encoding='gbk')
for para in doc.paragraphs:
print(para.Range.Text)
doc.Close()
word.Quit()
如果要存储在单个变量中,下面的代码就可以解决问题。
from win32com.client import Dispatch
word = Dispatch('Word.Application') # Open word application
# word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
path = r'C:\Projects.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')
#for para in doc.paragraphs:
# print(para.Range.Text)
content = '\n'.join([para.Range.Text for para in doc.paragraphs])
print(content)
doc.Close()
word.Quit()
使用rtf_to_text
足以将RTFinto
转换为Python中的字符串。
从 RTF 文件中读取内容,然后将其提供给 rtf_to_text
:
from striprtf.striprtf import rtf_to_text
with open("yourfile.rtf") as infile:
content = infile.read()
text = rtf_to_text(content)
print(text)