在函数中保存 pyspark 数据框
save a pyspark dataframe inside a function
我正在尝试将 pyspark 数据帧保存到 hdfs 文件夹中。这段代码在函数外运行良好,但是一旦我尝试将它放入函数中,我就会出错。可能是我如何引用函数参数的情况。谢谢您的帮助。
def save_file(df):
start_time = time.time()
df.createOrReplaceTempView("df")
hc.sql("create table hdfs_folder.{} as select * from {}".format(df,df))
print("{} saved in hdfs_folder".format(df))
print("**********************************")
print("--- %s seconds ---" % (time.time() - start_time))
save_file(py_df)
我想你想要的是使用字符串 df
而不是变量 df
如下:
def save_file(df):
start_time = time.time()
df.createOrReplaceTempView("df")
hc.sql("create table hdfs_folder.{} as select * from {}".format('df','df'))
print("{} saved in hdfs_folder".format('df'))
print("**********************************")
print("--- %s seconds ---" % (time.time() - start_time))
save_file(py_df)
已编辑 - 使用变量名称:
def save_file(df, name):
start_time = time.time()
df.createOrReplaceTempView("df")
hc.sql("create table hdfs_folder.{} as select * from {}".format(name,'df'))
print("{} saved in hdfs_folder".format(name))
print("**********************************")
print("--- %s seconds ---" % (time.time() - start_time))
save_file(py_df, 'py_df')
我正在尝试将 pyspark 数据帧保存到 hdfs 文件夹中。这段代码在函数外运行良好,但是一旦我尝试将它放入函数中,我就会出错。可能是我如何引用函数参数的情况。谢谢您的帮助。
def save_file(df):
start_time = time.time()
df.createOrReplaceTempView("df")
hc.sql("create table hdfs_folder.{} as select * from {}".format(df,df))
print("{} saved in hdfs_folder".format(df))
print("**********************************")
print("--- %s seconds ---" % (time.time() - start_time))
save_file(py_df)
我想你想要的是使用字符串 df
而不是变量 df
如下:
def save_file(df):
start_time = time.time()
df.createOrReplaceTempView("df")
hc.sql("create table hdfs_folder.{} as select * from {}".format('df','df'))
print("{} saved in hdfs_folder".format('df'))
print("**********************************")
print("--- %s seconds ---" % (time.time() - start_time))
save_file(py_df)
已编辑 - 使用变量名称:
def save_file(df, name):
start_time = time.time()
df.createOrReplaceTempView("df")
hc.sql("create table hdfs_folder.{} as select * from {}".format(name,'df'))
print("{} saved in hdfs_folder".format(name))
print("**********************************")
print("--- %s seconds ---" % (time.time() - start_time))
save_file(py_df, 'py_df')