Python IMAP 抓取程序无限期挂起
Python IMAP scraper hangs indefinitely
我正在尝试从我有权访问的 Gmail 帐户中的特定文件夹中抓取数据。
我最近尝试 运行ning this code 在 Windows 7 上使用 Python 2.7,同时登录了感兴趣的 Gmail 帐户。出于某种原因,尽管它似乎 运行 很长时间(我将其放置了长达 40 分钟)而没有完成或提供错误。
就目前而言,我在 Gmail 帐户中定位的文件夹只有大约 50 封简单的文本电子邮件,没有附件、图片或任何可能表明该过程应该花费很长时间的内容。在使用 IMAP 做类似的事情之前,有没有人遇到过这样的问题?
完整性代码:
#!/usr/bin/env python
#
# Very simple Python script to dump all emails in an IMAP folder to files.
# This code is released into the public domain.
#
# RKI Nov 2013
#
import sys
import imaplib
import getpass
IMAP_SERVER = 'imap.gmail.com'
EMAIL_ACCOUNT = "notatallawhistleblowerIswear@gmail.com"
EMAIL_FOLDER = "Top Secret/PRISM Documents"
OUTPUT_DIRECTORY = 'C:/src/tmp'
PASSWORD = getpass.getpass()
def process_mailbox(M):
"""
Dump all emails in the folder to files in output directory.
"""
rv, data = M.search(None, "ALL")
if rv != 'OK':
print "No messages found!"
return
for num in data[0].split():
rv, data = M.fetch(num, '(RFC822)')
if rv != 'OK':
print "ERROR getting message", num
return
print "Writing message ", num
f = open('%s/%s.eml' %(OUTPUT_DIRECTORY, num), 'wb')
f.write(data[0][1])
f.close()
def main():
M = imaplib.IMAP4_SSL(IMAP_SERVER)
M.login(EMAIL_ACCOUNT, PASSWORD)
rv, data = M.select(EMAIL_FOLDER)
if rv == 'OK':
print "Processing mailbox: ", EMAIL_FOLDER
process_mailbox(M)
M.close()
else:
print "ERROR: Unable to open mailbox ", rv
M.logout()
if __name__ == "__main__":
main()
代码对我来说工作正常。下面,我在您的代码中添加了一些调试打印(使用 pprint) to view the attributes of the IMAP4_SSL object M
. My Gmail uses two factor authentication so I needed to setup a gmail app password
from pprint import pprint
# ....
M = imaplib.IMAP4_SSL(IMAP_SERVER)
print('---- Attributes of the IMAP4_SSL connection before login ----')
pprint(vars(M))
M.login(EMAIL_ACCOUNT, PASSWORD)
print('\n \n')
print('---- Attributes of the IMAP4_SSL connection after login ----')
pprint(vars(M))
# open specific folder
rv, data = M.select(EMAIL_FOLDER)
print('\n \n')
print('---- Data returned from select of folder = {}'.format(data))
- 检查第一个
pprint(vars(M))
:
'welcome': '\* OK Gimap ready for requests from ...
'port': 993,
- 检查第二个
pprint(vars(M))
:
_cmd_log
登录成功:6: ('< PJIL1 OK **@gmail.com authenticated (Success)
从 M.select(EMAIL_FOLDER)
返回的 data
应该是可供下载的电子邮件数量。
我正在尝试从我有权访问的 Gmail 帐户中的特定文件夹中抓取数据。
我最近尝试 运行ning this code 在 Windows 7 上使用 Python 2.7,同时登录了感兴趣的 Gmail 帐户。出于某种原因,尽管它似乎 运行 很长时间(我将其放置了长达 40 分钟)而没有完成或提供错误。
就目前而言,我在 Gmail 帐户中定位的文件夹只有大约 50 封简单的文本电子邮件,没有附件、图片或任何可能表明该过程应该花费很长时间的内容。在使用 IMAP 做类似的事情之前,有没有人遇到过这样的问题?
完整性代码:
#!/usr/bin/env python
#
# Very simple Python script to dump all emails in an IMAP folder to files.
# This code is released into the public domain.
#
# RKI Nov 2013
#
import sys
import imaplib
import getpass
IMAP_SERVER = 'imap.gmail.com'
EMAIL_ACCOUNT = "notatallawhistleblowerIswear@gmail.com"
EMAIL_FOLDER = "Top Secret/PRISM Documents"
OUTPUT_DIRECTORY = 'C:/src/tmp'
PASSWORD = getpass.getpass()
def process_mailbox(M):
"""
Dump all emails in the folder to files in output directory.
"""
rv, data = M.search(None, "ALL")
if rv != 'OK':
print "No messages found!"
return
for num in data[0].split():
rv, data = M.fetch(num, '(RFC822)')
if rv != 'OK':
print "ERROR getting message", num
return
print "Writing message ", num
f = open('%s/%s.eml' %(OUTPUT_DIRECTORY, num), 'wb')
f.write(data[0][1])
f.close()
def main():
M = imaplib.IMAP4_SSL(IMAP_SERVER)
M.login(EMAIL_ACCOUNT, PASSWORD)
rv, data = M.select(EMAIL_FOLDER)
if rv == 'OK':
print "Processing mailbox: ", EMAIL_FOLDER
process_mailbox(M)
M.close()
else:
print "ERROR: Unable to open mailbox ", rv
M.logout()
if __name__ == "__main__":
main()
代码对我来说工作正常。下面,我在您的代码中添加了一些调试打印(使用 pprint) to view the attributes of the IMAP4_SSL object M
. My Gmail uses two factor authentication so I needed to setup a gmail app password
from pprint import pprint
# ....
M = imaplib.IMAP4_SSL(IMAP_SERVER)
print('---- Attributes of the IMAP4_SSL connection before login ----')
pprint(vars(M))
M.login(EMAIL_ACCOUNT, PASSWORD)
print('\n \n')
print('---- Attributes of the IMAP4_SSL connection after login ----')
pprint(vars(M))
# open specific folder
rv, data = M.select(EMAIL_FOLDER)
print('\n \n')
print('---- Data returned from select of folder = {}'.format(data))
- 检查第一个
pprint(vars(M))
:'welcome': '\* OK Gimap ready for requests from ...
'port': 993,
- 检查第二个
pprint(vars(M))
:_cmd_log
登录成功:6: ('< PJIL1 OK **@gmail.com authenticated (Success)
从 data
应该是可供下载的电子邮件数量。
M.select(EMAIL_FOLDER)
返回的