正在从安全邮件中心下载 Excel 份报告
Downloading Excel Reports From a Secure Mail Center
一直在编写脚本以自动执行工作职责的新程序员。
问题范围:
我通过电子邮件收到来自外部供应商的双月 excel 报告。该供应商使用 ZixMail 进行加密,而我的公司并未利用这一点。因此,我必须使用我的用户名和密码通过安全邮件中心访问这些电子邮件,才能登录该邮件中心网站。我正在尝试与该服务器建立连接并下载附件。
我尝试过的:
尝试将 IMAP 连接到“服务器”(我不确定该网站是否是邮件服务器)
- 打了很多次,因为我永远无法连接(如果有建议请分享)
正在使用会话通过 HTTP 访问站点。
- 我可以连接到该站点,但是当我转到 .get 和 .write 文件时,我的 excel 文件 returns 空白且已损坏。
- 在邮件 Center/website 上,当我单击 link/url 时,它会自动下载文件。我不确定为什么这必须如此具有挑战性?
您下载文件的网站的源代码如下所示:
a rel="external" href="/s/attachment?name=Random Letters and Numbers=emdeon" title="文件 Title.xlsx"
href 看起来一点也不像普通的 URL 并且不像我见过的大多数示例那样以 .xlsx 或任何其他类型的文件结尾。
我想我真的只是在寻找任何想法、想法、帮助解决方案。
这是我的 HTTP 连接代码
import requests
import urllib.request
import shutil
import os
#Fill in your details here to be posted to the login form.
payload = {
'em': 'Username',
'passphrase': 'Password',
'validationKey': 'Key'
}
#This reads your URL and returns if the file is downloadable
def is_downloadable(URL_D):
h = requests.head(URL_D, allow_redirects=True)
header = h.headers
content_type = header.get('content-type')
if 'text' in content_type.lower():
return False
if 'html' in content_type.lower():
return False
return True
def download_file(URL_D):
with requests.get(URL_D, stream=True) as r:
r.raise_for_status()
with open(FileName, 'wb') as f:
for chunk in r.iter_content(chunk_size=None):
if chunk:
f.write(chunk)
f.close()
return FileName
def Main():
with requests.Session() as s:
p = s.post(URL, data=payload, allow_redirects=True )
print(is_downloadable(URL_D))
download_file(URL_D)
if __name__ == '__main__':
Path = "<path>"
FileName = os.path.join(Path,"Testing File.xlsx")
URL = 'login URL'
URL_D = 'Attachment URL"
Main()
is_downloadable(URL_D) returns 为假且 excel 文件为空且已损坏
这是我的 IMAP 尝试代码:
import email
import imaplib
import os
class FetchEmail():
connection = None
error = None
def __init__(self, mail_server, username, password):
self.connection = imaplib.IMAP4_SSL(mail_server,port=993)
self.connection.login(username, password)
self.connection.select('inbox',readonly=False) # so we can mark mails as read
def close_connection(self):
"""
Close the connection to the IMAP server
"""
self.connection.close()
def save_attachment(self, msg, download_folder):
att_path = "No attachment found."
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join(download_folder, filename)
if not os.path.isfile(att_path):
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
return att_path
def fetch_messages(self):
emails = []
(result, messages) = self.connection.search(None, "(ON 20-Nov-2020)")
if result == "OK":
for message in messages[0].split(' '):
try:
ret, data = self.connection.fetch(message,'(RFC822)')
except:
print ("No emails to read for date.")
self.close_connection()
exit()
msg = email.message_from_bytes(data[0][1])
if isinstance(msg, str) == False:
emails.append(msg)
response, data = self.connection.store(message, '+FLAGS','\Seen')
return emails
self.error = "Failed to retreive emails."
return emails
def Main():
p = FetchEmail(mail_server,username,password)
msg = p.fetch_messages()
p.save_attachment(msg, download_folder)
p.close_connection()
if __name__ == "__main__":
mail_server = "Server"
username = "username"
password = "password"
download_folder= Path
Main()
错误信息:TimeoutError:[WinError 10060]连接尝试失败,因为连接方在一段时间后没有正确响应,或者建立的连接失败,因为连接的主机没有响应
即使我写错了 IMAP 脚本,我也尝试通过 cmd 提示进行 IMAP 连接,结果相同。
总而言之,我正在寻找的是解决此问题的一些建议和想法。谢谢!
对于因类似问题偶然发现此问题的任何人。可能不是,因为我有一个非常奇怪的习惯,就是让一切变得简单、复杂。但是
我能够通过使用 selenium webdriver 登录网站并使用“点击”机制导航来解决问题。这是我能够成功下载报告的唯一方法。
import time
import os
import re
import datetime
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
today = datetime.date.today()
first = today.replace(day=1)
year = today.strftime('%Y')
month = today.strftime('%B')
lastMonth = (first - datetime.timedelta(days=1)).strftime('%b')
def Main():
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
s = Chrome(executable_path=path to chrome extension)
s.get("Website login page")
s.find_element_by_id("loginname").send_keys('username')
s.find_element_by_id("password").send_keys('password')
s.find_element_by_class_name("button").click()
for i in range(50):
s.get("landing page post login")
n = str(i)
subject = ("mailsubject"+n)
sent = ("mailsent"+n)
title = s.find_element_by_id(subject).text
date = s.find_element_by_id(sent).text
regex = "Bi Monthly"
regex_pr = "PR"
match = re.search(regex,title)
match_pr = re.search(regex_pr,title)
if match and not match_pr:
match_m = re.search(r"(\D{3})",date)
match_d = re.search(r"(\d{1,2})",date)
day = int(match_d.group())
m = (match_m.group(1))
if (day <= 15) and (m == lastMonth):
print("All up to date files have been dowloaded")
break
else:
name = ("messageItem"+n)
s.find_element_by_id(name).click()
s.find_element_by_partial_link_text("xlsx").click() #This should be under the else but its not formatting right on here
else:
continue
time.sleep(45)
if __name__ == "__main__":
Main()
一直在编写脚本以自动执行工作职责的新程序员。
问题范围:
我通过电子邮件收到来自外部供应商的双月 excel 报告。该供应商使用 ZixMail 进行加密,而我的公司并未利用这一点。因此,我必须使用我的用户名和密码通过安全邮件中心访问这些电子邮件,才能登录该邮件中心网站。我正在尝试与该服务器建立连接并下载附件。
我尝试过的:
尝试将 IMAP 连接到“服务器”(我不确定该网站是否是邮件服务器)
- 打了很多次,因为我永远无法连接(如果有建议请分享)
正在使用会话通过 HTTP 访问站点。
- 我可以连接到该站点,但是当我转到 .get 和 .write 文件时,我的 excel 文件 returns 空白且已损坏。
- 在邮件 Center/website 上,当我单击 link/url 时,它会自动下载文件。我不确定为什么这必须如此具有挑战性?
- 我可以连接到该站点,但是当我转到 .get 和 .write 文件时,我的 excel 文件 returns 空白且已损坏。
您下载文件的网站的源代码如下所示:
a rel="external" href="/s/attachment?name=Random Letters and Numbers=emdeon" title="文件 Title.xlsx"
href 看起来一点也不像普通的 URL 并且不像我见过的大多数示例那样以 .xlsx 或任何其他类型的文件结尾。
我想我真的只是在寻找任何想法、想法、帮助解决方案。
这是我的 HTTP 连接代码
import requests
import urllib.request
import shutil
import os
#Fill in your details here to be posted to the login form.
payload = {
'em': 'Username',
'passphrase': 'Password',
'validationKey': 'Key'
}
#This reads your URL and returns if the file is downloadable
def is_downloadable(URL_D):
h = requests.head(URL_D, allow_redirects=True)
header = h.headers
content_type = header.get('content-type')
if 'text' in content_type.lower():
return False
if 'html' in content_type.lower():
return False
return True
def download_file(URL_D):
with requests.get(URL_D, stream=True) as r:
r.raise_for_status()
with open(FileName, 'wb') as f:
for chunk in r.iter_content(chunk_size=None):
if chunk:
f.write(chunk)
f.close()
return FileName
def Main():
with requests.Session() as s:
p = s.post(URL, data=payload, allow_redirects=True )
print(is_downloadable(URL_D))
download_file(URL_D)
if __name__ == '__main__':
Path = "<path>"
FileName = os.path.join(Path,"Testing File.xlsx")
URL = 'login URL'
URL_D = 'Attachment URL"
Main()
is_downloadable(URL_D) returns 为假且 excel 文件为空且已损坏
这是我的 IMAP 尝试代码:
import email
import imaplib
import os
class FetchEmail():
connection = None
error = None
def __init__(self, mail_server, username, password):
self.connection = imaplib.IMAP4_SSL(mail_server,port=993)
self.connection.login(username, password)
self.connection.select('inbox',readonly=False) # so we can mark mails as read
def close_connection(self):
"""
Close the connection to the IMAP server
"""
self.connection.close()
def save_attachment(self, msg, download_folder):
att_path = "No attachment found."
for part in msg.walk():
if part.get_content_maintype() == 'multipart':
continue
if part.get('Content-Disposition') is None:
continue
filename = part.get_filename()
att_path = os.path.join(download_folder, filename)
if not os.path.isfile(att_path):
fp = open(att_path, 'wb')
fp.write(part.get_payload(decode=True))
fp.close()
return att_path
def fetch_messages(self):
emails = []
(result, messages) = self.connection.search(None, "(ON 20-Nov-2020)")
if result == "OK":
for message in messages[0].split(' '):
try:
ret, data = self.connection.fetch(message,'(RFC822)')
except:
print ("No emails to read for date.")
self.close_connection()
exit()
msg = email.message_from_bytes(data[0][1])
if isinstance(msg, str) == False:
emails.append(msg)
response, data = self.connection.store(message, '+FLAGS','\Seen')
return emails
self.error = "Failed to retreive emails."
return emails
def Main():
p = FetchEmail(mail_server,username,password)
msg = p.fetch_messages()
p.save_attachment(msg, download_folder)
p.close_connection()
if __name__ == "__main__":
mail_server = "Server"
username = "username"
password = "password"
download_folder= Path
Main()
错误信息:TimeoutError:[WinError 10060]连接尝试失败,因为连接方在一段时间后没有正确响应,或者建立的连接失败,因为连接的主机没有响应
即使我写错了 IMAP 脚本,我也尝试通过 cmd 提示进行 IMAP 连接,结果相同。
总而言之,我正在寻找的是解决此问题的一些建议和想法。谢谢!
对于因类似问题偶然发现此问题的任何人。可能不是,因为我有一个非常奇怪的习惯,就是让一切变得简单、复杂。但是
我能够通过使用 selenium webdriver 登录网站并使用“点击”机制导航来解决问题。这是我能够成功下载报告的唯一方法。
import time
import os
import re
import datetime
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
today = datetime.date.today()
first = today.replace(day=1)
year = today.strftime('%Y')
month = today.strftime('%B')
lastMonth = (first - datetime.timedelta(days=1)).strftime('%b')
def Main():
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)
s = Chrome(executable_path=path to chrome extension)
s.get("Website login page")
s.find_element_by_id("loginname").send_keys('username')
s.find_element_by_id("password").send_keys('password')
s.find_element_by_class_name("button").click()
for i in range(50):
s.get("landing page post login")
n = str(i)
subject = ("mailsubject"+n)
sent = ("mailsent"+n)
title = s.find_element_by_id(subject).text
date = s.find_element_by_id(sent).text
regex = "Bi Monthly"
regex_pr = "PR"
match = re.search(regex,title)
match_pr = re.search(regex_pr,title)
if match and not match_pr:
match_m = re.search(r"(\D{3})",date)
match_d = re.search(r"(\d{1,2})",date)
day = int(match_d.group())
m = (match_m.group(1))
if (day <= 15) and (m == lastMonth):
print("All up to date files have been dowloaded")
break
else:
name = ("messageItem"+n)
s.find_element_by_id(name).click()
s.find_element_by_partial_link_text("xlsx").click() #This should be under the else but its not formatting right on here
else:
continue
time.sleep(45)
if __name__ == "__main__":
Main()