如何从 Python 中的 epub 中提取文本(来自 url 来源)
How extract text from epub in Python (from an url source)
我想从 epub 中提取内容,但我不知道如何从 url 源中提取内容。我的代码现在是这样的:(ebooklib)
import urllib.request
import ebooklib
from ebooklib import epub
myurl = "https://diegooli.s3.us-east-2.amazonaws.com/Cabana.epub"
with urllib.request.urlopen(myurl) as url:
s = url.read()
book = epub.read_epub(s)
for image in book.get_items_of_type(ebooklib.ITEM_IMAGE):
print(image)
错误,很明显:
AttributeError: 'bytes' object has no attribute 'seek'
谁能给我一盏灯?
先保存 epub 文件,然后使用电子书库打开文件
- 使用urllib下载电子书
- 使用ebooklib打开电子书并获取图片
代码如下:
import urllib.request
import ebooklib
from ebooklib import epub
myurl = "https://diegooli.s3.us-east-2.amazonaws.com/Cabana.epub"
with urllib.request.urlopen(myurl) as url:
s = url.read()
with open(r"c:\tmp\test.epub", "wb") as f:
f.write(s)
book = epub.read_epub(r"c:\tmp\test.epub")
for image in book.get_items_of_type(ebooklib.ITEM_IMAGE):
print(image)
我想从 epub 中提取内容,但我不知道如何从 url 源中提取内容。我的代码现在是这样的:(ebooklib)
import urllib.request
import ebooklib
from ebooklib import epub
myurl = "https://diegooli.s3.us-east-2.amazonaws.com/Cabana.epub"
with urllib.request.urlopen(myurl) as url:
s = url.read()
book = epub.read_epub(s)
for image in book.get_items_of_type(ebooklib.ITEM_IMAGE):
print(image)
错误,很明显:
AttributeError: 'bytes' object has no attribute 'seek'
谁能给我一盏灯?
先保存 epub 文件,然后使用电子书库打开文件
- 使用urllib下载电子书
- 使用ebooklib打开电子书并获取图片
代码如下:
import urllib.request
import ebooklib
from ebooklib import epub
myurl = "https://diegooli.s3.us-east-2.amazonaws.com/Cabana.epub"
with urllib.request.urlopen(myurl) as url:
s = url.read()
with open(r"c:\tmp\test.epub", "wb") as f:
f.write(s)
book = epub.read_epub(r"c:\tmp\test.epub")
for image in book.get_items_of_type(ebooklib.ITEM_IMAGE):
print(image)