PySpark

Question

我有一个生成 DataFrame 的脚本。我将 DF 转换为 CSV，然后将其作为电子邮件附件发送。问题是 header + 数据都在第一行，所以生成的 CSV 有 60k 列和 1 行。怎么了？

这是我的代码：

df.toPandas().to_csv("/dbfs/<path>/df.csv", mode='w+', encoding='utf-8')
server = smtplib.SMTP('smtp.gmail.com:587')
server.ehlo()
server.starttls()
server.login("<email>", "<password>")

sender = "<email>"
recipient = "<email>"
msg = MIMEMultipart()
msg['Subject'] = 'I need help'
msg['From'] = sender
msg['To'] = recipient

filedata = sc.textFile("/dbfs/<path>/df.csv", use_unicode=False)
msg.attach(MIMEText('This is your test message with attachment...'))
part = MIMEApplication("".join(filedata.collect()), Name="df.csv")
part['Content-Disposition'] = 'attachment; filename="%s"' % 'df.csv'
msg.attach(part)
server.sendmail(sender, [recipient], msg.as_string())
server.close()

Answer 1

只需更换

"".join(filedata.collect()

和

"\n".join(filedata.collect())

或

sc.wholeTextFiles("/dbfs/<path>/df.csv").values().first()

甚至更好的写作-完整阅读例程：

MIMEApplication(df.toPandas().to_csv(), Name="df.csv")

PySpark - 发送附有 CSV 的电子邮件，整个 CSV 显示在一行中

PySpark - Send Email with CSV attached, entire CSV showing up on one line

python

csv

export-to-csv

mime