EMR Step 命令运行器配置单元脚本
EMR Step command-runner hive-script
我正在尝试 运行 EMR 集群上 S3 上的配置单元脚本。
当通过 SSH 连接到 EMR 集群时,输入
"hive -f s3://..."
有效。但是,我希望自动完成此操作,因此我创建了一个 python 脚本并尝试向集群添加一个步骤。但是,我无法将此步骤添加到 运行,即使通过 AWS 控制台手动添加它也是如此。
对于 jar 文件,我指定了 "command-runner.jar" 但无论我随后使用什么参数(我按照另一个线程的建议尝试使用 "hive -f s3://..." 但那不起作用),该步骤总是立即失败。
当切换到 script-runner.jar 而不是 command-运行ner 时,我可以通过某种方式让它与 command
一起工作
/usr/share/aws/emr/scripts/hive-script --run-hive-script --args -f s3://...
但是,几秒钟后我得到一个错误,我会在 stderr 中找到这个错误。
Logging initialized using configuration in jar:file:/home/hadoop/.versions/hive-0.13.1-amzn-3/lib/hive-common-0.13.1-amzn-3.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:692)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1420)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2483)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2495)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 7 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1418)
... 12 more
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true, username = hive. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Access denied for user 'hive'@'localhost' (using password: YES)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1084)
理想情况下,我想通过使用命令-runner.jar 来解决这个问题。有人能告诉我我应该为此使用的正确参数吗(在 add_job_flow 的 Boto3 参数中的 aws 控制台 and/or)?
好吧,对于遇到同样问题的任何人:原来我错过了将 "Hive" 添加到 EMR 集群的应用程序中...
运行 来自 S3 的配置单元脚本
hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar hive-script --run-hive-script --args -f s3://path/scripts/hive_show_table.hql
我正在尝试 运行 EMR 集群上 S3 上的配置单元脚本。
当通过 SSH 连接到 EMR 集群时,输入
"hive -f s3://..."
有效。但是,我希望自动完成此操作,因此我创建了一个 python 脚本并尝试向集群添加一个步骤。但是,我无法将此步骤添加到 运行,即使通过 AWS 控制台手动添加它也是如此。 对于 jar 文件,我指定了 "command-runner.jar" 但无论我随后使用什么参数(我按照另一个线程的建议尝试使用 "hive -f s3://..." 但那不起作用),该步骤总是立即失败。 当切换到 script-runner.jar 而不是 command-运行ner 时,我可以通过某种方式让它与 command
一起工作/usr/share/aws/emr/scripts/hive-script --run-hive-script --args -f s3://...
但是,几秒钟后我得到一个错误,我会在 stderr 中找到这个错误。
Logging initialized using configuration in jar:file:/home/hadoop/.versions/hive-0.13.1-amzn-3/lib/hive-common-0.13.1-amzn-3.jar!/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:692)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:636)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1420)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2483)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2495)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
... 7 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1418)
... 12 more
Caused by: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection to the given database. JDBC url = jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true, username = hive. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Access denied for user 'hive'@'localhost' (using password: YES)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1084)
理想情况下,我想通过使用命令-runner.jar 来解决这个问题。有人能告诉我我应该为此使用的正确参数吗(在 add_job_flow 的 Boto3 参数中的 aws 控制台 and/or)?
好吧,对于遇到同样问题的任何人:原来我错过了将 "Hive" 添加到 EMR 集群的应用程序中...
运行 来自 S3 的配置单元脚本
hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar hive-script --run-hive-script --args -f s3://path/scripts/hive_show_table.hql