如何从 Python 中的 epub 中提取文本(来自 url 来源)

How extract text from epub in Python (from an url source)

我想从 epub 中提取内容,但我不知道如何从 url 源中提取内容。我的代码现在是这样的:(ebooklib)

import urllib.request
import ebooklib
from ebooklib import epub


myurl = "https://diegooli.s3.us-east-2.amazonaws.com/Cabana.epub"

with urllib.request.urlopen(myurl) as url:
    s = url.read()

book = epub.read_epub(s)

for image in book.get_items_of_type(ebooklib.ITEM_IMAGE):
    print(image)

错误,很明显:

    AttributeError: 'bytes' object has no attribute 'seek'

谁能给我一盏灯?

先保存 epub 文件,然后使用电子书库打开文件

  • 使用urllib下载电子书
  • 使用ebooklib打开电子书并获取图片

代码如下:

import urllib.request
import ebooklib
from ebooklib import epub

myurl = "https://diegooli.s3.us-east-2.amazonaws.com/Cabana.epub"

with urllib.request.urlopen(myurl) as url:
    s = url.read()
    
with open(r"c:\tmp\test.epub", "wb") as f:
    f.write(s)

book = epub.read_epub(r"c:\tmp\test.epub")

for image in book.get_items_of_type(ebooklib.ITEM_IMAGE):
    print(image)