如何将图片从 URL 添加到 docx python?
How to add pictures to docx python from URL?
我在使用 Python Docx 库时遇到问题,我从网站上抓取了图像,我想将它们添加到 docx,但我无法直接将图像添加到 docx,我一直收到错误消息:
File "C:\Python27\lib\site-packages\docx\image\image.py", line 46, in
from_file
with open(path, 'rb') as f: IOError: [Errno 22] invalid mode ('rb') or filename:
'http://upsats.com/Content/Product/img/Product/Thumb/PCB2x8-.jpg'
这是我的代码:
import urllib
import requests
from bs4 import BeautifulSoup
from docx import Document
from docx.shared import Inches
import os
document = Document()
document.add_heading("Megatronics Items Full Search", 0)
FullPage = ['New-Arrivals-2017-6', 'Big-Sales-click-here', 'Arduino-Development-boards',
'Robotics-and-Copters', 'Breakout-Boards', 'RC-Wireless-communication', 'GSM,-GPS,-RFID,-Wifi',
'Advance-Development-boards-and-starter-Kits', 'Sensors-and-IMU', 'Solenoid-valves,-Relays,--Switches',
'Motors,-drivers,-wheels', 'Microcontrollers-and-Educational-items', 'Arduino-Shields',
'Connectivity-Interfaces', 'Power-supplies,-Batteries-and-Chargers', 'Programmers-and-debuggers',
'LCD,-LED,-Cameras', 'Discrete-components-IC', 'Science-Education-and-DIY', 'Consumer-Electronics-and-tools',
'Mechanical-parts', '3D-Printing-and-CNC-machines', 'ATS', 'UPS', 'Internal-Battries-UPS',
'External-Battries-UPS']
urlp1 = "http://www.arduinopak.com/Prd.aspx?Cat_Name="
URL = urlp1 + FullPage[0]
for n in FullPage:
URL = urlp1 + n
page = urllib.urlopen(URL)
bsObj = BeautifulSoup(page, "lxml")
panel = bsObj.findAll("div", {"class": "panel"})
for div in panel:
titleList = div.find('div', attrs={'class': 'panel-heading'})
imageList = div.find('div', attrs={'class': 'pro-image'})
descList = div.find('div', attrs={'class': 'pro-desc'})
r = requests.get("http://upsats.com/", stream=True)
data = r.text
for link in imageList.find_all('img'):
image = link.get("src")
image_name = os.path.split(image)[1]
r2 = requests.get(image)
with open(image_name, "wb") as f:
f.write(r2.content)
print(titleList.get_text(separator=u' '))
print(imageList.get_text(separator=u''))
print(descList.get_text(separator=u' '))
document.add_heading("%s \n" % titleList.get_text(separator=u' '))
document.add_picture(image, width=Inches(1.5))
document.add_paragraph("%s \n" % descList.get_text(separator=u' '))
document.save('megapy.docx')
不是全部,只是主要部分。现在,我在复制下载的图片时遇到问题,我想将其复制到 docx。我不知道如何添加图片。我该如何转换它?我想我必须格式化它,但我该怎么做呢?
我只知道问题出在这段代码中:
document.add_picture(image, width=Inches(1.0))
如何使 URL 中的这张图片显示在 docx 中?我错过了什么?
更新
我用 10 张图片进行了测试,得到了一个 docx。当加载很多时,我在一个地方有一个错误,我通过添加一个尝试来覆盖它,除了(见下文)。生成的 megapy.docx 大小为 165 MB,创建时间约为 10 分钟。
with open(image_name, "wb") as f:
f.write(r2.content)
收件人:
image = io.BytesIO(r2.content)
并添加:
try:
document.add_picture(image, width=Inches(1.5))
except:
pass
使用 io 库创建类文件对象。
适用于 python2&3 的示例:
import requests
import io
from docx import Document
from docx.shared import Inches
url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Usain_Bolt_Rio_100m_final_2016k.jpg/200px-Usain_Bolt_Rio_100m_final_2016k.jpg'
response = requests.get(url, stream=True)
image = io.BytesIO(response.content)
document = Document()
document.add_picture(image, width=Inches(1.25))
document.save('demo.docx')
我在使用 Python Docx 库时遇到问题,我从网站上抓取了图像,我想将它们添加到 docx,但我无法直接将图像添加到 docx,我一直收到错误消息:
File "C:\Python27\lib\site-packages\docx\image\image.py", line 46, in from_file with open(path, 'rb') as f: IOError: [Errno 22] invalid mode ('rb') or filename: 'http://upsats.com/Content/Product/img/Product/Thumb/PCB2x8-.jpg'
这是我的代码:
import urllib
import requests
from bs4 import BeautifulSoup
from docx import Document
from docx.shared import Inches
import os
document = Document()
document.add_heading("Megatronics Items Full Search", 0)
FullPage = ['New-Arrivals-2017-6', 'Big-Sales-click-here', 'Arduino-Development-boards',
'Robotics-and-Copters', 'Breakout-Boards', 'RC-Wireless-communication', 'GSM,-GPS,-RFID,-Wifi',
'Advance-Development-boards-and-starter-Kits', 'Sensors-and-IMU', 'Solenoid-valves,-Relays,--Switches',
'Motors,-drivers,-wheels', 'Microcontrollers-and-Educational-items', 'Arduino-Shields',
'Connectivity-Interfaces', 'Power-supplies,-Batteries-and-Chargers', 'Programmers-and-debuggers',
'LCD,-LED,-Cameras', 'Discrete-components-IC', 'Science-Education-and-DIY', 'Consumer-Electronics-and-tools',
'Mechanical-parts', '3D-Printing-and-CNC-machines', 'ATS', 'UPS', 'Internal-Battries-UPS',
'External-Battries-UPS']
urlp1 = "http://www.arduinopak.com/Prd.aspx?Cat_Name="
URL = urlp1 + FullPage[0]
for n in FullPage:
URL = urlp1 + n
page = urllib.urlopen(URL)
bsObj = BeautifulSoup(page, "lxml")
panel = bsObj.findAll("div", {"class": "panel"})
for div in panel:
titleList = div.find('div', attrs={'class': 'panel-heading'})
imageList = div.find('div', attrs={'class': 'pro-image'})
descList = div.find('div', attrs={'class': 'pro-desc'})
r = requests.get("http://upsats.com/", stream=True)
data = r.text
for link in imageList.find_all('img'):
image = link.get("src")
image_name = os.path.split(image)[1]
r2 = requests.get(image)
with open(image_name, "wb") as f:
f.write(r2.content)
print(titleList.get_text(separator=u' '))
print(imageList.get_text(separator=u''))
print(descList.get_text(separator=u' '))
document.add_heading("%s \n" % titleList.get_text(separator=u' '))
document.add_picture(image, width=Inches(1.5))
document.add_paragraph("%s \n" % descList.get_text(separator=u' '))
document.save('megapy.docx')
不是全部,只是主要部分。现在,我在复制下载的图片时遇到问题,我想将其复制到 docx。我不知道如何添加图片。我该如何转换它?我想我必须格式化它,但我该怎么做呢?
我只知道问题出在这段代码中:
document.add_picture(image, width=Inches(1.0))
如何使 URL 中的这张图片显示在 docx 中?我错过了什么?
更新
我用 10 张图片进行了测试,得到了一个 docx。当加载很多时,我在一个地方有一个错误,我通过添加一个尝试来覆盖它,除了(见下文)。生成的 megapy.docx 大小为 165 MB,创建时间约为 10 分钟。
with open(image_name, "wb") as f:
f.write(r2.content)
收件人:
image = io.BytesIO(r2.content)
并添加:
try:
document.add_picture(image, width=Inches(1.5))
except:
pass
使用 io 库创建类文件对象。
适用于 python2&3 的示例:
import requests
import io
from docx import Document
from docx.shared import Inches
url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/f/f3/Usain_Bolt_Rio_100m_final_2016k.jpg/200px-Usain_Bolt_Rio_100m_final_2016k.jpg'
response = requests.get(url, stream=True)
image = io.BytesIO(response.content)
document = Document()
document.add_picture(image, width=Inches(1.25))
document.save('demo.docx')