Apache Spark 中的持久化选项

Question

您好，我是 Apache Spark 的新手，我正在 java 中使用 Apache spark sql 查询配置单元表。

这是我的代码

    SparkConf sparkConf = new 
SparkConf().setAppName("Hive").setMaster("local");   
   JavaSparkContext ctx = new JavaSparkContext(sparkConf);
    HiveContext sqlContext = new 
org.apache.spark.sql.hive.HiveContext(ctx.sc());
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value'").collect();
    org.apache.spark.sql.Row[] results = sqlContext.sql("Select * from 
Tablename where Column='Value1'").collect();

我还尝试运行在同一个应用程序中使用两个不同的查询，我看到它每次都与 hive 元存储建立连接。如何解决这个问题，并告诉我如何有效地使用持久化选项。

Answer 1

在执行这两个查询之前调用 sqlContext.cacheTable("Tablename") 可能会有所帮助。

根据文档，它可以满足您的需求。

Caches the specified table in-memory.

Apache Spark 中的持久化选项

Persist option in Apache Spark

java

hadoop

apache-spark-sql