Pyspark:如何在没有字符的情况下转换数据框

Pyspark : How to transform dataframe without the caracter

Matricule (type array)
[TKI1]
[TKI4]

我会得到这个dataframe

Matricule (type string)
TKI1
TKI4

因为你的Marticule开头是ArrayType。您可以直接使用 getItem 如下 -

数据准备

df = pd.DataFrame({
        'Matricule':[['TKI1'],['TKI4']],
})

sparkDF = sql.createDataFrame(df)

sparkDF.show()

+---------+
|Matricule|
+---------+
|   [TKI1]|
|   [TKI4]|
+---------+

sparkDF.printSchema()

root
 |-- Matricule: array (nullable = true)
 |    |-- element: string (containsNull = true)

获取物品

sparkDF = sparkDF.withColumn('Matricule_string',F.col('Matricule').getItem(0))

sparkDF.show()

+---------+----------------+
|Matricule|Matricule_string|
+---------+----------------+
|   [TKI1]|            TKI1|
|   [TKI4]|            TKI4|
+---------+----------------+