Python。如何将 MOBI 文件转换为文本（或 EPUB 文件）

Question

我在将 MOBI 文件转换为 Python 中的文本时遇到问题。

我发现这个库 - https://github.com/iscc/mobi 可以将 MOBI 转换为 EPUB，然后我发现 ebooklib 库可以很好地将 EPUB 文件转换为文本。

问题是只有 ebooklib 似乎可以正常工作。如果我给它原生的 EPUB 文件，一切都会正常工作。但是如果我尝试从 mobi 库向它传递文件路径，那么我会收到一堆没有多大意义的错误。

而且我不知道到底是什么原因造成的。也许我的 MOBI 文件以某种方式加密？（它们是我几个月前从 Humble Bundle 购买的原版书籍）。但是 mobi 库没有抛出任何错误。

或者我不能直接传递由 mobi 库生成的文件路径？也许我应该以某种方式保存此文件，将其移动到其他文件夹，然后它才能被 ebooklib“读取”？

我的代码如下所示：

import mobi

import ebooklib
from ebooklib import epub

tempdir, filepath = mobi.extract("book.mobi")

# This throws error:
book = epub.read_epub(filepath)

# Native, normal epub file is working ok:
book = epub.read_epub("book.epub")

我认为错误并没有说明什么：

Traceback (most recent call last):
  File "/ebooklib/utils.py", line 35, in parse_string
tree = etree.parse(io.BytesIO(s.encode('utf-8')))
AttributeError: 'bytes' object has no attribute 'encode'

Answer 1

您可以将其保存为 html 文件

pip install mobi

比

import mobi
filepath="./example.mobi"
folder="./"

!mobiunpack -r   filepath folder

所有可用选项的列表here

或者这里我提出另一种方法：

pip install mobi
pip install html2text

import mobi
import html2text

filename="test.mobi"
tempdir, filepath = mobi.extract(filename)
file = open(filepath, "r")
content=file.read()
print(html2text.html2text(content))

Python。如何将 MOBI 文件转换为文本（或 EPUB 文件）

Python. How to convert MOBI file to a text (or EPUB file)

python

epub

ebooklib