在函数中保存 pyspark 数据框

save a pyspark dataframe inside a function

我正在尝试将 pyspark 数据帧保存到 hdfs 文件夹中。这段代码在函数外运行良好,但是一旦我尝试将它放入函数中,我就会出错。可能是我如何引用函数参数的情况。谢谢您的帮助。

def save_file(df):

    start_time = time.time()

    df.createOrReplaceTempView("df") 
    hc.sql("create table hdfs_folder.{} as select * from {}".format(df,df))

    print("{} saved in hdfs_folder".format(df))

    print("**********************************")    
    print("--- %s seconds ---" % (time.time() - start_time))

save_file(py_df)

我想你想要的是使用字符串 df 而不是变量 df 如下:

def save_file(df):

    start_time = time.time()

    df.createOrReplaceTempView("df") 
    hc.sql("create table hdfs_folder.{} as select * from {}".format('df','df'))

    print("{} saved in hdfs_folder".format('df'))

    print("**********************************")    
    print("--- %s seconds ---" % (time.time() - start_time))

save_file(py_df)

已编辑 - 使用变量名称:

def save_file(df, name):

    start_time = time.time()

    df.createOrReplaceTempView("df") 
    hc.sql("create table hdfs_folder.{} as select * from {}".format(name,'df'))

    print("{} saved in hdfs_folder".format(name))

    print("**********************************")    
    print("--- %s seconds ---" % (time.time() - start_time))

save_file(py_df, 'py_df')