如何将 AsTable 保存到 s3?
How to saveAsTable to s3?
看起来这会出错
df.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.mode("overwrite")
.bucketBy(32,"column")
.sortBy("column")
.parquet("s3://....");
有错误
Exception in thread "main" org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now; at org.apache.spark.sql.DataFrameWriter.assertNotBucketed(DataFrameWriter.scala:314)
我看到 saveAsTable("myfile")
仍然受支持,但它只在本地写入。作业完成后,我如何获取 saveAsTable(...)
输出并将其放在 s3 上?
You Can use like below:
df
.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.option("path","s3://....")
.mode("overwrite")
.format("parquet")
.bucketBy(32,"column").sortBy("column")
.saveAsTable("tableName");
这将创建一个指向 S3 位置的外部 table
.option("path","s3://....") 是这里的问题
看起来这会出错
df.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.mode("overwrite")
.bucketBy(32,"column")
.sortBy("column")
.parquet("s3://....");
有错误
Exception in thread "main" org.apache.spark.sql.AnalysisException: 'save' does not support bucketing right now; at org.apache.spark.sql.DataFrameWriter.assertNotBucketed(DataFrameWriter.scala:314)
我看到 saveAsTable("myfile")
仍然受支持,但它只在本地写入。作业完成后,我如何获取 saveAsTable(...)
输出并将其放在 s3 上?
You Can use like below:
df
.write()
.option("mode", "DROPMALFORMED")
.option("compression", "snappy")
.option("path","s3://....")
.mode("overwrite")
.format("parquet")
.bucketBy(32,"column").sortBy("column")
.saveAsTable("tableName");
这将创建一个指向 S3 位置的外部 table .option("path","s3://....") 是这里的问题