在 Spark Beeline 上从 S3 创建外部 table
Create external table from S3 on Spark Beeline
在 4 节点集群中的每个节点上对 /etc/dse/spark/hive-site.xml 进行了以下更改。
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>****</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>****</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>****</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>****</value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value>****</value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value>****</value>
</property>
在运行 spark thrift 服务器和 spark-beeline 客户端的节点上设置以下 ENV 变量
导出 AWS_SECRET_ACCESS_KEY=****
导出 AWS_ACCESS_KEY_ID=*****
启动 Spark thrift 服务器如下
dse -u cassandra -p ***** spark-sql-thriftserver start --conf spark.cores.max=2 --conf spark.executor.memory=2G --conf
spark.driver.maxResultSize=1G --conf spark.kryoserializer.buffer.max=512M --conf spark.sql.thriftServer.incrementalCollect=true
使用 S3 存储桶作为源从 Spark Beeline 创建了 table
dse -u cassandra -p ***** spark-beeline --total-executor-cores 2 --executor-memory 2G
The log file is at /home/ubuntu/.spark-beeline.log
Beeline version 1.2.1.2_dse_spark by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 cassandra
Connecting to jdbc:hive2://localhost:10000
Enter password for jdbc:hive2://localhost:10000: ****************
Connected to: Spark SQL (version 1.6.3)
Driver: Hive JDBC (version 1.2.1.2_dse_spark)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE test_table (name string,phone string) PARTITIONED BY(day date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3a://hive-getsimpl/test';
我收到以下错误
Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.MetaException (message:com.amazonaws.services.s3.model.AmazonS3Exception:
Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 29991E2338CC6B49, AWS Error Code: null,
AWS Error Message: Forbidden, S3 Extended Request ID: kidxZNQI73PBsluGoLQlB4+VEdIx0t82Y/J/q69NA18k8MnSILEyo5riCuj3QcEiGiFRqB4rAbc=) (state=,code=0)
注意:AWS 密钥有效并且已与其他 python 脚本一起使用
s3a使用的配置名称不同;您需要设置 fs.s3a.access.key
和 fs.s3a.secret.key
。不要触摸环境变量,因为它们只会增加混乱。
在 4 节点集群中的每个节点上对 /etc/dse/spark/hive-site.xml 进行了以下更改。
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>****</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>****</value>
</property>
<property>
<name>fs.s3n.awsAccessKeyId</name>
<value>****</value>
</property>
<property>
<name>fs.s3n.awsSecretAccessKey</name>
<value>****</value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value>****</value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value>****</value>
</property>
在运行 spark thrift 服务器和 spark-beeline 客户端的节点上设置以下 ENV 变量
导出 AWS_SECRET_ACCESS_KEY=****
导出 AWS_ACCESS_KEY_ID=*****
启动 Spark thrift 服务器如下
dse -u cassandra -p ***** spark-sql-thriftserver start --conf spark.cores.max=2 --conf spark.executor.memory=2G --conf
spark.driver.maxResultSize=1G --conf spark.kryoserializer.buffer.max=512M --conf spark.sql.thriftServer.incrementalCollect=true
使用 S3 存储桶作为源从 Spark Beeline 创建了 table
dse -u cassandra -p ***** spark-beeline --total-executor-cores 2 --executor-memory 2G
The log file is at /home/ubuntu/.spark-beeline.log
Beeline version 1.2.1.2_dse_spark by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000 cassandra
Connecting to jdbc:hive2://localhost:10000
Enter password for jdbc:hive2://localhost:10000: ****************
Connected to: Spark SQL (version 1.6.3)
Driver: Hive JDBC (version 1.2.1.2_dse_spark)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE test_table (name string,phone string) PARTITIONED BY(day date)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3a://hive-getsimpl/test';
我收到以下错误
Error: org.apache.spark.sql.execution.QueryExecutionException: FAILED:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.MetaException (message:com.amazonaws.services.s3.model.AmazonS3Exception:
Status Code: 403, AWS Service: Amazon S3, AWS Request ID: 29991E2338CC6B49, AWS Error Code: null,
AWS Error Message: Forbidden, S3 Extended Request ID: kidxZNQI73PBsluGoLQlB4+VEdIx0t82Y/J/q69NA18k8MnSILEyo5riCuj3QcEiGiFRqB4rAbc=) (state=,code=0)
注意:AWS 密钥有效并且已与其他 python 脚本一起使用
s3a使用的配置名称不同;您需要设置 fs.s3a.access.key
和 fs.s3a.secret.key
。不要触摸环境变量,因为它们只会增加混乱。