Apache Spark 有不同的 Metastore/Data 目录选项？

Different Metastore/Data Catalog options with Apache Spark?

Apache Spark 可以使用哪些 metastore/data-catalog 选项？

在最简单的情况下，我可以使用 Hive Metastore - 它与 Hive、Spark 和 Presto 配合得很好。我可以在这里使用任何其他数据目录选项吗？

来自 https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html 的文档：

Spark SQL also supports reading and writing data stored in Apache Hive. However, since Hive has a large number of dependencies, these dependencies are not included in the default Spark distribution. If Hive dependencies can be found on the classpath, Spark will load them automatically. Note that these Hive dependencies must also be present on all of the worker nodes, as they will need access to the Hive serialization and deserialization libraries (SerDes) in order to access data stored in Hive.

Configuration of Hive is done by placing your hive-site.xml, core-site.xml (for security configuration), and hdfs-site.xml (for HDFS configuration) file in conf/.

When working with Hive, one must instantiate SparkSession with Hive support, including connectivity to a persistent Hive metastore, support for Hive serdes, and Hive user-defined functions. Users who do not have an existing Hive deployment can still enable Hive support. When not configured by the hive-site.xml, the context automatically creates metastore_db in the current directory and creates a directory configured by spark.sql.warehouse.dir, which defaults to the directory spark-warehouse in the current directory that the Spark application is started. Note that the hive.metastore.warehouse.dir property in hive-site.xml is deprecated since Spark 2.0.0. Instead, use spark.sql.warehouse.dir to specify the default location of database in warehouse. You may need to grant write privilege to the user who starts the Spark application.

但您可能是其他意思？简单案例？

Apache Spark 有不同的 Metastore/Data 目录选项？

Different Metastore/Data Catalog options with Apache Spark?

hive

apache-spark

apache-spark-sql

hive-metastore