HDP 中的 spark2 sql 无法访问 Hive table
Hive table not accessible in spark2 sql in HDP
我正在 运行 从事 HDP 的工作。
export SPARK-MAJOR-VERSION=2 spark-submit --class com.spark.sparkexamples.Audit --master yarn --deploy-mode cluster \ --files /bigdata/datalake/app/config/metadata.csv BRNSAUDIT_v4.jar dl_raw.ACC /bigdatahdfs/landing/AUDIT/BW/2017/02/27/ACC_hash_total_and_count_20170227.dat TH 20170227
它失败了,错误是:
Table or view not found: dl_raw
.ACC
; line 1 pos 94; 'Aggregate [count(1) AS rec_cnt#58L, 'count('BRCH_NUM) AS hashcount#59, 'sum('ACC_NUM) AS hashsum#60] +- 'Filter (('trim('country_code) = trim(TH)) && ('from_unixtime('unix_timestamp('substr('bus_date, 0, 11), MM/dd/yyyy), yyyyMMdd) = 20170227)) +- 'UnresolvedRelation dl_raw
.`ACC'*
而 table 存在于 Hive 中并且可以从 spark-shell.
访问
这是 spark 会话的代码。
val sparkSession = SparkSession.builder .appName("spark session example") .enableHiveSupport() .getOrCreate()
sparkSession.conf.set("spark.sql.crossJoin.enabled", "true")
val df_table_stats = sparkSession.sql("""select count(*) as rec_cnt,count(distinct BRCH_NUM) as hashcount, sum(ACC_NUM) as hashsum
from dl_raw.ACC
where trim(country_code) = trim('BW')
and from_unixtime(unix_timestamp(substr(bus_date,0,11),'MM/dd/yyyy'),'yyyyMMdd')='20170227'
""")
提交 spark 作业时在 --files 参数中包含 hive-site.xml。
您还可以将 hive-site.xml 配置文件从 hive-conf 目录复制到 spark-conf 目录。这应该可以解决您的问题。
cp /etc/hive/conf/hive-site.xml /etc/spark2/conf
我正在 运行 从事 HDP 的工作。
export SPARK-MAJOR-VERSION=2 spark-submit --class com.spark.sparkexamples.Audit --master yarn --deploy-mode cluster \ --files /bigdata/datalake/app/config/metadata.csv BRNSAUDIT_v4.jar dl_raw.ACC /bigdatahdfs/landing/AUDIT/BW/2017/02/27/ACC_hash_total_and_count_20170227.dat TH 20170227
它失败了,错误是:
Table or view not found:
dl_raw
.ACC
; line 1 pos 94; 'Aggregate [count(1) AS rec_cnt#58L, 'count('BRCH_NUM) AS hashcount#59, 'sum('ACC_NUM) AS hashsum#60] +- 'Filter (('trim('country_code) = trim(TH)) && ('from_unixtime('unix_timestamp('substr('bus_date, 0, 11), MM/dd/yyyy), yyyyMMdd) = 20170227)) +- 'UnresolvedRelationdl_raw
.`ACC'*
而 table 存在于 Hive 中并且可以从 spark-shell.
访问这是 spark 会话的代码。
val sparkSession = SparkSession.builder .appName("spark session example") .enableHiveSupport() .getOrCreate()
sparkSession.conf.set("spark.sql.crossJoin.enabled", "true")
val df_table_stats = sparkSession.sql("""select count(*) as rec_cnt,count(distinct BRCH_NUM) as hashcount, sum(ACC_NUM) as hashsum
from dl_raw.ACC
where trim(country_code) = trim('BW')
and from_unixtime(unix_timestamp(substr(bus_date,0,11),'MM/dd/yyyy'),'yyyyMMdd')='20170227'
""")
提交 spark 作业时在 --files 参数中包含 hive-site.xml。
您还可以将 hive-site.xml 配置文件从 hive-conf 目录复制到 spark-conf 目录。这应该可以解决您的问题。
cp /etc/hive/conf/hive-site.xml /etc/spark2/conf