配置单元 - 如何将 parquet/ORC 设置为默认输出格式

hive - how to set parquet/ORC as default output format

hive 使用文本作为默认格式,如果需要 parquet/ORC 文件格式,则必须添加额外的 "store as parquet/ORC" 子句。
如何将 parquet/ORC 设置为默认输出格式?

hive.default.fileformat

Default Value: TextFile
Added In: Hive 0.2.0

Default file format for CREATE TABLE statement. Options are TextFile, SequenceFile, RCfile, ORC, and Parquet. Users can explicitly say CREATE TABLE ... STORED AS TEXTFILE|SEQUENCEFILE|RCFILE|ORC|AVRO|INPUTFORMAT...OUTPUTFORMAT... to override. (RCFILE was added in Hive 0.6.0, ORC in 0.11.0, AVRO in 0.14.0, and Parquet in 2.3.0) See Row Format, Storage Format, and SerDe for details.


hive.default.fileformat.managed

Default Value: none
Added In: Hive

1.2.0 with HIVE-9915 Default file format for CREATE TABLE statement applied to managed tables only. External tables will be created with format specified by hive.default.fileformat. Options are none, TextFile, SequenceFile, RCfile, ORC, and Parquet (as of Hive 2.3.0). Leaving this null will result in using hive.default.fileformat for all native tables. For non-native tables the file format is determined by the storage handler, as shown below (see the StorageHandlers section for more information on managed/external and native/non-native terminology).


+----------+---------------------------------------------------------------------------+-------------------------------------+
|          |                                  Native                                   |             Non-Native              |
+----------+---------------------------------------------------------------------------+-------------------------------------+
| Managed  | hive.default.fileformat.managed (or fall back to hive.default.fileformat) | Not covered by default file-formats |
| External | hive.default.fileformat                                                   | Not covered by default file-formats |
+----------+---------------------------------------------------------------------------+-------------------------------------+

https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-FileFormats

对于外部表,执行以下命令:

set hive.default.fileformat=Parquet

对于托管表,执行以下操作:

set hive.default.fileformat.managed=Parquet

这将仅为当前会话设置。如果要为整个配置单元配置设置这些,请在 hive-site.xml 中设置这些属性并重新启动配置单元服务。