我们可以用 pyspark 中的数据从现有的 table 创建一个新的 table

can we create a new table from an existing table with data in pyspark

为 Teradata 创建 table 语法:

create table <DBname>.<Tablename>
as
select * from <DBname>.<Tablename>
with data;

同理,我们如何在 Spark SQL 中创建一个 table?

在 Spark SQL 中也几乎相同。

示例:

CREATE TABLE tablename 
    STORED AS PARQUET LOCATION 'some/location/incase/of/external/table' 
AS
SELECT *
    FROM source_table
WHERE 1=1

正则表达式:(高级)

CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  [(col_name data_type [COMMENT col_comment], ...)]
  [COMMENT table_comment]
  [
   [ROW FORMAT row_format] 
   [STORED AS file_format]
  ]
  [LOCATION path_to_save]
  [AS select_statement]

顺便说一句,Spark 支持更多的 Hive 语法和功能。你可以参考CTAS doc here