将字符串转换为 Pyspark 数据框
Convert String to Pyspark Dataframe
我在列表中有一个字符串类似于
ListofString = ['Column1,Column2,Column3,\nCol1Value1,Col2Value1,Col3Value1,\nCol1Value2,Col2Value2,Col3Value2']
如何将此字符串转换为如下所示的 pyspark Dataframe
'\n' 为新行
Column1 Column2 Column3
-----------------------------------------
Col1Value1 Col2Value1 Col3Value1
Col1Value2 Col2Value2 Col3Value2
您只需将字符串列表转换为正确的格式,如下所示:
# convert the list of string into proper format
>>> l = ' '.join(ListofString)
>>> l = l.replace(',',' ')
>>> l = [x.strip().split(' ') for x in l.split('\n')]
>>> print(l)
>>> [['Column1', 'Column2', 'Column3'], ['Col1Value1', 'Col2Value1', 'Col3Value1'], ['Col1Value2', 'Col2Value2', 'Col3Value2']]
>>> df = spark.createDataFrame(l[1:],l[0])
>>> df.show()
+----------+----------+----------+
| Column1| Column2| Column3|
+----------+----------+----------+
|Col1Value1|Col2Value1|Col3Value1|
|Col1Value2|Col2Value2|Col3Value2|
+----------+----------+----------+
我在列表中有一个字符串类似于
ListofString = ['Column1,Column2,Column3,\nCol1Value1,Col2Value1,Col3Value1,\nCol1Value2,Col2Value2,Col3Value2']
如何将此字符串转换为如下所示的 pyspark Dataframe
'\n' 为新行
Column1 Column2 Column3
-----------------------------------------
Col1Value1 Col2Value1 Col3Value1
Col1Value2 Col2Value2 Col3Value2
您只需将字符串列表转换为正确的格式,如下所示:
# convert the list of string into proper format
>>> l = ' '.join(ListofString)
>>> l = l.replace(',',' ')
>>> l = [x.strip().split(' ') for x in l.split('\n')]
>>> print(l)
>>> [['Column1', 'Column2', 'Column3'], ['Col1Value1', 'Col2Value1', 'Col3Value1'], ['Col1Value2', 'Col2Value2', 'Col3Value2']]
>>> df = spark.createDataFrame(l[1:],l[0])
>>> df.show()
+----------+----------+----------+
| Column1| Column2| Column3|
+----------+----------+----------+
|Col1Value1|Col2Value1|Col3Value1|
|Col1Value2|Col2Value2|Col3Value2|
+----------+----------+----------+