如何从 Databricks 上的 Apache Spark 中的 str 输出创建 Spark 或 Pandas Dataframe

How to Create Spark or Pandas Dataframe from str output in Apache Spark on Databricks

我已将变量“myoutput”分配给如下字符串

我的输出=result.content

我的输出如下:

Out[10]: 'Company A Invoice\nInvoice For:\nAddress:\n567 Main St.\nRedmond, WA\n555-555-5555\nBilbo Baggins\n123 Hobbit Lane\nRedmond, WA\n555-555-5555\nSubtotal: 300.00\nTax: 30.00\nTip: 100.00\nTotal: 430.00\nSignature: ____Bilbo Baggins__________\nItem\nQuantity\nPrice\nA\n1\n10.99\nB\n2\n14.67\nC\n4\n15.66\nD\n1\n12.00\nE\n4\n10.00\nF\n6\n12.00\nG\n8\n22.00'

我想从“myoutput”创建一个 spark 数据框或 pandas 数据框。

有什么想法吗?

 import pandas as pd
 str_output = 'Company A Invoice\nInvoice For:\nAddress:\n567 Main St.\nRedmond, WA\n555-555-5555\nBilbo Baggins\n123 Hobbit Lane\nRedmond, WA\n555-555-5555\nSubtotal: 300.00\nTax: 30.00\nTip: 100.00\nTotal: 430.00\nSignature: ____Bilbo Baggins__________\nItem\nQuantity\nPrice\nA\n1\n10.99\nB\n2\n14.67\nC\n4\n15.66\nD\n1\n12.00\nE\n4\n10.00\nF\n6\n12.00\nG\n8\n22.00'
 df_data = pd.DataFrame({'ColumnA':str_output.splitlines()})
 df_data

参考: How to split a Python string on new line characters