从 PySpark 中的列名列表派生 structType 架构
Derive structType schema from list of column names in PySpark
在 PySpark 中,我不想硬编码模式定义,我想从下面的变量中派生模式。
mySchema=[("id","IntegerType()", True),
("name","StringType()", True),
("InsertDate","TimestampType()", True)
]
result = mySchema.map(lambda l: StructField(l[0],l[1],l[2]))
如何实现从 mySchema
生成 structTypeSchema
的逻辑?
预期输出:
structTypeSchema = StructType(fields=[
StructField("id", IntegerType(), True),
StructField("name", StringType(), True),
StructField("InsertDate",TimestampType(), True)])
您可以尝试以下方法:
from pyspark.sql import types as T
structTypeSchema = T.StructType(
[T.StructField(f[0], eval(f'T.{f[1]}'), f[2]) for f in mySchema]
)
或
from pyspark.sql.types import *
structTypeSchema = StructType(
[StructField(f[0], eval(f[1]), f[2]) for f in mySchema]
)
在 PySpark 中,我不想硬编码模式定义,我想从下面的变量中派生模式。
mySchema=[("id","IntegerType()", True),
("name","StringType()", True),
("InsertDate","TimestampType()", True)
]
result = mySchema.map(lambda l: StructField(l[0],l[1],l[2]))
如何实现从 mySchema
生成 structTypeSchema
的逻辑?
预期输出:
structTypeSchema = StructType(fields=[
StructField("id", IntegerType(), True),
StructField("name", StringType(), True),
StructField("InsertDate",TimestampType(), True)])
您可以尝试以下方法:
from pyspark.sql import types as T
structTypeSchema = T.StructType(
[T.StructField(f[0], eval(f'T.{f[1]}'), f[2]) for f in mySchema]
)
或
from pyspark.sql.types import *
structTypeSchema = StructType(
[StructField(f[0], eval(f[1]), f[2]) for f in mySchema]
)