如何将所有维基百科图像添加到我的 docx 文件中?

How I can add all wikipedia images to my docx file?

我正在使用维基百科api,我想将页面上的所有照片都放到 docx 文档中。目前我只能在文档上放一张图片,但这并不好。维基百科的一些页面没有给我任何照片,当我在互联网上搜索时,我可以看到网站上有一些照片。这是我的代码:

import wikipedia
import re
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.shared import Pt
from docx.shared import Mm
import requests
import io
from docx.shared import Inches

name = input("Introdu numele tau: ")
wikipedia.set_lang("ro")
hs = input("La ce liceu esti?\n")
cls = input("In ce clasa esti?\n")
date = input("Pe ce data trebuie facut proiectul?\n")
title = input("Despre ce vrei sa fie proiectul tau?\n")
while True:
    try:
        wiki = wikipedia.page(title)
        break
    except:
        print("Nume proiect invalid")
        title = input("Introdu alt nume de proiect: \n")
text = wiki.content
text = re.sub(r'==', '', text)
text = re.sub(r'=', '', text)
text = re.sub(r'\n', '\n    ', text)
split = text.split('Vezi și', 1)
text = split[0]
print(text)

document = Document()

section = document.sections[0]
section.page_height = Mm(297)
section.page_width = Mm(210)
section.left_margin = Mm(25.4)
section.right_margin = Mm(25.4)
section.top_margin = Mm(25.4)
section.bottom_margin = Mm(25.4)
section.header_distance = Mm(12.7)
section.footer_distance = Mm(12.7)

style = document.styles['Normal']
font = style.font
font.name = 'Times New Roman'
font.size = Pt(12)

url = wiki.images[1]
response = requests.get(url, stream=True)
image = io.BytesIO(response.content)
try:
    document.add_picture(image, width=Inches(1.5))
except:
    pass


paragraph = document.add_paragraph(date)
paragraph.alignment = WD_ALIGN_PARAGRAPH.RIGHT
paragraph = document.add_paragraph(name)
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
paragraph = document.add_paragraph('Clasa '+cls)
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
paragraph = document.add_paragraph(hs)
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT
paragraph = document.add_heading(title, 0)
paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
paragraph = document.add_paragraph('    ' + text)
paragraph.style = document.styles['Normal']
paragraph.alignment = WD_ALIGN_PARAGRAPH.LEFT


document.save(title + ".docx")
input()

我觉得问题出在这里:

url = wiki.images[1]
response = requests.get(url, stream=True)
image = io.BytesIO(response.content)
try:
    document.add_picture(image, width=Inches(1.5))
except:
    pass

因为在docx文档上只显示一张图片

我建议您在 Python 中探索 loopsfunctions。循环使您能够执行某些代码零次或多次,而函数使您可以将一大块代码组合在一起并按名称访问它。在更高级的语言中,这称为 abstraction.

此维基百科目的的循环类似于:

for image in wiki.images:
    document.add_picture(image, ...)

那么如果wiki.images为空,则不会添加图片。如果它有 5 张图像,则将添加所有这 5 张图像。

一个函数可能是这样的:

def add_wiki_image(document, image_url):
    response = requests.get(image_url, stream=True)
    image = io.BytesIO(response.content)
    document.add_picture(image, width=Inches(1.5)

可以这样称呼:

for image_url in wiki.images:
    add_wiki_image(document, image_url)

add_wiki_image() 作为函数允许在任何需要的地方简洁地引用(“调用”)该代码,并且实现图像添加操作的细节很巧妙 encapsulated 在函数定义中。