PySpark - 发送附有 CSV 的电子邮件,整个 CSV 显示在一行中
PySpark - Send Email with CSV attached, entire CSV showing up on one line
我有一个生成 DataFrame 的脚本。我将 DF 转换为 CSV,然后将其作为电子邮件附件发送。问题是 header + 数据都在第一行,所以生成的 CSV 有 60k 列和 1 行。怎么了?
这是我的代码:
df.toPandas().to_csv("/dbfs/<path>/df.csv", mode='w+', encoding='utf-8')
server = smtplib.SMTP('smtp.gmail.com:587')
server.ehlo()
server.starttls()
server.login("<email>", "<password>")
sender = "<email>"
recipient = "<email>"
msg = MIMEMultipart()
msg['Subject'] = 'I need help'
msg['From'] = sender
msg['To'] = recipient
filedata = sc.textFile("/dbfs/<path>/df.csv", use_unicode=False)
msg.attach(MIMEText('This is your test message with attachment...'))
part = MIMEApplication("".join(filedata.collect()), Name="df.csv")
part['Content-Disposition'] = 'attachment; filename="%s"' % 'df.csv'
msg.attach(part)
server.sendmail(sender, [recipient], msg.as_string())
server.close()
只需更换
"".join(filedata.collect()
和
"\n".join(filedata.collect())
或
sc.wholeTextFiles("/dbfs/<path>/df.csv").values().first()
甚至更好的写作-完整阅读例程:
MIMEApplication(df.toPandas().to_csv(), Name="df.csv")
我有一个生成 DataFrame 的脚本。我将 DF 转换为 CSV,然后将其作为电子邮件附件发送。问题是 header + 数据都在第一行,所以生成的 CSV 有 60k 列和 1 行。怎么了?
这是我的代码:
df.toPandas().to_csv("/dbfs/<path>/df.csv", mode='w+', encoding='utf-8')
server = smtplib.SMTP('smtp.gmail.com:587')
server.ehlo()
server.starttls()
server.login("<email>", "<password>")
sender = "<email>"
recipient = "<email>"
msg = MIMEMultipart()
msg['Subject'] = 'I need help'
msg['From'] = sender
msg['To'] = recipient
filedata = sc.textFile("/dbfs/<path>/df.csv", use_unicode=False)
msg.attach(MIMEText('This is your test message with attachment...'))
part = MIMEApplication("".join(filedata.collect()), Name="df.csv")
part['Content-Disposition'] = 'attachment; filename="%s"' % 'df.csv'
msg.attach(part)
server.sendmail(sender, [recipient], msg.as_string())
server.close()
只需更换
"".join(filedata.collect()
和
"\n".join(filedata.collect())
或
sc.wholeTextFiles("/dbfs/<path>/df.csv").values().first()
甚至更好的写作-完整阅读例程:
MIMEApplication(df.toPandas().to_csv(), Name="df.csv")