如何读取 .rtf 文件并转换成 python3 字符串并存储在 python3 列表中?

How to read .rtf file and convert into python3 strings and can be stored in python3 list?

我有一个 .rtf 文件,我想通过使用任何包使用 python3 读取文件并将字符串存储到列表中,但它应该与 Windows 和 [=28] 兼容=].

我试过 striprtf 但 read_rtf 不工作。

from striprtf.striprtf import rtf_to_text
from striprtf.striprtf import read_rtf
rtf = read_rtf("file.rtf")
text = rtf_to_text(rtf)
print(text)

但是在这段代码中,错误是:cannot import name 'read_rtf'

任何人都可以建议任何从 python3 中的 .rtf 文件获取字符串的方法吗?

你试过吗?

with open('yourfile.rtf', 'r') as file:
    text = file.read()
print(text)

对于超大文件,试试这个:

with open("yourfile.rtf") as infile:
    for line in infile:
        do_something_with(line)

试试这个:

from striprtf.striprtf import rtf_to_text

sample_text = "any text as a string you want"
text = rtf_to_text(sample_text)

读取 RTF 文件并处理其中的数据是很棘手的,这取决于您拥有的文件,因此我尝试了以上所有方法都没有用,最后,以下代码对我有用。希望对正在寻找解决方案的小伙伴有所帮助。

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects.1\power.rtf' 
doc = word.Documents.Open(FileName=path, Encoding='gbk')
 
for para in doc.paragraphs:
    print(para.Range.Text)
 
doc.Close()
word.Quit()

如果要存储在单个变量中,下面的代码就可以解决问题。

from win32com.client import Dispatch
 
word = Dispatch('Word.Application') # Open word application
 # word = DispatchEx('Word.Application') # start a separate process
word.Visible = 0 # Run in the background, no display
word.DisplayAlerts = 0 # No warning
 
path = r'C:\Projects.1\output_5.rtf' # Write absolute path, relative path will dial wrong
doc = word.Documents.Open(FileName=path, Encoding='gbk')

#for para in doc.paragraphs:
#    print(para.Range.Text)


content = '\n'.join([para.Range.Text for para in doc.paragraphs])

print(content)

doc.Close()
word.Quit()

使用rtf_to_text足以将RTFinto转换为Python中的字符串。 从 RTF 文件中读取内容,然后将其提供给 rtf_to_text:

from striprtf.striprtf import rtf_to_text

with open("yourfile.rtf") as infile:
    content = infile.read()
    text = rtf_to_text(content)
print(text)