在发送带有相关 pdf 附件的多封电子邮件时显示空白 pdf 附件

showing blank pdf attachment while sending multiple emails with relevant pdf attachment

我是 python 的新手,我的任务是发送多封带有相关附件的电子邮件。我会详细说明,一个文件夹包含多个 pdf 文件,每个文件包含一些文本,包括电子邮件 id.I 需要从每个 pdf 文件中读取电子邮件 ID,并将与附件相同的文件发送到 pdf 文件中的 mailid。下面是参考代码

# Get the count of files in the folder
import os
import re
global str
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
from email.mime.base import MIMEBase

cpt = sum([len(files) for r, d, files in 
os.walk("D:\MyOfficeDocuments\ADCB\PythonScripts\PdfFiles")])

#Reading Mail from each pdf file and send the same file as attachment to 
these mails
import PyPDF2
from os import listdir
from os.path import isfile, join
from PyPDF2 import PdfFileWriter, PdfFileReader
mypath='D:\MyOfficeDocuments\ADCB\PythonScripts\PdfFiles'
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
for file in onlyfiles:

count = 1
while count <cpt:
    os.chdir(r'D:\MyOfficeDocuments\ADCB\PythonScripts\PdfFiles')
    pdfFileObj = open(file,'rb')
    pdfReader = PyPDF2.PdfFileReader(pdfFileObj)
    pageObj = pdfReader.getPage(0)
    count +=1
    text = pageObj.extractText()
    email_user='madhugut82@gmail.com'
    eline = re.findall('\S+@\S+.com', text)
    email_send=eline
    print(file)
    password='harshi54537'
    subject='Python !'
    msg=MIMEMultipart()
    msg['From']=email_user
    msg['To']=', '.join(email_send)
    #listalink = " ".join(listalink)
    msg['Subject']=subject
    #print (email_send)
    body='Hi there, sending this email from python using python scripting'
    msg.attach(MIMEText(body,'plain'))
    filename 
    ='D:\MyOfficeDocuments\ADCB\PythonScripts\Destination\Document.txt'
    attachment=open(file,'rb')
    #print(attachment)
    part=MIMEBase('application','pdf')
    part.set_payload(attachment.read())
    part.add_header('Content-Disposition',"attachement; filename="+file)
    msg.attach(part)
    #email.encoders.encode_base64(part)
    print('x')

    text=msg.as_string()

    #text=msg.encode("utf8")
    #text=msg.as_string().encode('utf-8','ignore')
    #text=msg.as_string().encode('ascii','ignore')
    server=smtplib.SMTP('smtp.gmail.com',587)
    server.starttls()
    server.login(email_user,password)
    server.sendmail(email_user,email_send,text)
    #server.sendmail(email_user,email_send,msg.encode("utf8"))
    server.quit()

根据上面的代码,我收到如下所示的错误消息

msg = _fix_eols(msg).encode('ascii')

UnicodeEncodeError: 'ascii' 编解码器无法对位置 559-562 中的字符进行编码:序号不在范围内 (128)

但是如果我将代码更改为

text=ms.as_string().encode("UTF")

我没有收到任何错误,但附件显示为空白

请告诉我确切的问题出在哪里以及获取空白 pdf 附件的问题是什么。

我请求你如果有任何代码建议那么请只建议 pdf 文件

提前致谢 马杜

您的问题是您对(二进制)pdf 文件使用了一个简单的 MIMEBase。由于 MIMEBase 是各种可能消息类型的父级 class,它不会对其有效负载进行编码,并且您的消息包含原始 8 位字节。

这里有两个可能的修正:

  1. 只是base64编码pdf文件内容:

    ...
    from email.encoders import encode_base64
    ...
        part=MIMEBase('application','pdf')
        part.set_payload(attachment.read())
        part.add_header('Content-Disposition',"attachement; filename="+file)
        encode_base64(part)
        msg.attach(part)
    ...
    
  2. 使用更专业的 MIMEApplication,它默认对所有内容进行编码:

    ...
    from email.mime.application import MIMEApplication
    ...
        part=MIMEApplication(attachment.read(),'pdf')
        part.add_header('Content-Disposition',"attachement; filename="+file)
        msg.attach(part)
    ...
    

我建议您使用第二种方式,因为 MIMEBase 的文档说:

Ordinarily you won’t create instances specifically of MIMEBase, although you could. MIMEBase is provided primarily as a convenient base class for more specific MIME-aware subclasses.