如何将 SQL 查询的结果从 Databricks 导出到 Azure Data Lake Store
How to Export Results of a SQL Query from Databricks to Azure Data Lake Store
我正在尝试将 Databricks 中 spark.sql 查询的结果导出到 Azure Data Lake Store 中的文件夹 - ADLS
我查询的table也在ADLS中。
我已经使用以下命令从 Databricks 访问了 ADLS 中的文件:
base = spark.read.csv("adl://carlslake.azuredatalakestore.net/landing/",inferSchema=True,header=True)
base.createOrReplaceTempView('basetable')
我正在使用以下命令查询 table:
try:
dataframe = spark.sql("select * from basetable where LOAD_ID = 1199")
except:
print("Exception occurred 1166")
else:
print("Table Load_id 1166")
然后我尝试使用以下方法将结果导出到 Azure 中的文件夹:
try:
dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")
except:
print("Exception Occurred 1166")
else:
print("Delta File Created")
这里有两个奇怪的问题:
我已经指定查询load_id = 1199,虽然没有load_id = 1199查询仍然成功。
如果第一个 "try" 失败,我希望第二个 "try" 语句失败,但第二个 try 语句运行第一个 "try" 语句的问候.
谁能告诉我哪里出错了?
table可以在这里查看
thetable
只是想和你分享答案;
try:
dataframe = spark.sql("select * from basetable where LOAD_ID = 1166")
except:
print("Exception occurred 1166")
if dataframe.count() == 0:
print("No data rows 1166")
else:
dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")
希望对你也有用。
我正在尝试将 Databricks 中 spark.sql 查询的结果导出到 Azure Data Lake Store 中的文件夹 - ADLS
我查询的table也在ADLS中。
我已经使用以下命令从 Databricks 访问了 ADLS 中的文件:
base = spark.read.csv("adl://carlslake.azuredatalakestore.net/landing/",inferSchema=True,header=True)
base.createOrReplaceTempView('basetable')
我正在使用以下命令查询 table:
try:
dataframe = spark.sql("select * from basetable where LOAD_ID = 1199")
except:
print("Exception occurred 1166")
else:
print("Table Load_id 1166")
然后我尝试使用以下方法将结果导出到 Azure 中的文件夹:
try:
dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")
except:
print("Exception Occurred 1166")
else:
print("Delta File Created")
这里有两个奇怪的问题:
我已经指定查询load_id = 1199,虽然没有load_id = 1199查询仍然成功。
如果第一个 "try" 失败,我希望第二个 "try" 语句失败,但第二个 try 语句运行第一个 "try" 语句的问候.
谁能告诉我哪里出错了?
table可以在这里查看 thetable
只是想和你分享答案;
try:
dataframe = spark.sql("select * from basetable where LOAD_ID = 1166")
except:
print("Exception occurred 1166")
if dataframe.count() == 0:
print("No data rows 1166")
else:
dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")
希望对你也有用。