如何在 SQL 语句中将 spark 数据框用作 table
How to use a spark dataframe as a table in a SQL statement
我在 python 中有一个 spark 数据框。如何在 SparkSQL 语句中使用它?
例如:
df = spark.createDataFrame(data = array_of_table_and_time_tuples
, schema = StructType([StructField('table_name', StringType(), True),
StructField('update_time', TimestampType(), True)]))
# something needs to be added here to make the dataframe reference-able by the SQL below
spark.sql(f"""merge {load_tracking_table} t
using update_datetimes s
on t.table_name = s.table_name
when matched UPDATE SET t.valid_as_of_date = s.update_time""")
df.createOrReplaceTempView("the_name_of_the_view")
所以对于上面的例子:
df = spark.createDataFrame(data = array_of_table_and_time_tuples
, schema = StructType([StructField('table_name', StringType(), True),
StructField('update_time', TimestampType(), True)]))
df.createOrReplaceTempView("update_datetimes")
spark.sql(f"""merge {load_tracking_table} t
using update_datetimes s
on t.table_name = s.table_name
when matched UPDATE SET t.valid_as_of_date = s.update_time""")
我在 python 中有一个 spark 数据框。如何在 SparkSQL 语句中使用它?
例如:
df = spark.createDataFrame(data = array_of_table_and_time_tuples
, schema = StructType([StructField('table_name', StringType(), True),
StructField('update_time', TimestampType(), True)]))
# something needs to be added here to make the dataframe reference-able by the SQL below
spark.sql(f"""merge {load_tracking_table} t
using update_datetimes s
on t.table_name = s.table_name
when matched UPDATE SET t.valid_as_of_date = s.update_time""")
df.createOrReplaceTempView("the_name_of_the_view")
所以对于上面的例子:
df = spark.createDataFrame(data = array_of_table_and_time_tuples
, schema = StructType([StructField('table_name', StringType(), True),
StructField('update_time', TimestampType(), True)]))
df.createOrReplaceTempView("update_datetimes")
spark.sql(f"""merge {load_tracking_table} t
using update_datetimes s
on t.table_name = s.table_name
when matched UPDATE SET t.valid_as_of_date = s.update_time""")