从 AWS Lambda 向 Glue 作业传递参数
Passing parameters to Glue job from AWS Lambda
我们需要在触发 Glue 作业时将 4 个参数从 AWS Lambda 传递到 AWS Glue 作业。
response = client.start_job_run(JobName = 'my_test_Job',
Arguments = {
'--yr_partition_val': 2017,
'--mon_partition_val': 05,
'--date_partition_val': 25,
'--hour_partition_val': 07 } )
Glue 需要捕获这 4 个参数才能在 pyspark 胶水代码中进一步处理。
我试过在胶水中使用下面的代码来捕捉参数:
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,
['JOB_NAME',
'yr_partition_val',
'mon_partition_val',
'date_partition_val',
'hour_partition_val'])
但出现错误:
self.error(_('argument %s is required') % name)
awsglue.utils.GlueArgumentError: argument --JobName is required
有人可以帮忙吗?
AWS says '--JOB_NAME'
is internal to Glue and should not be set.
Also, the arguments are case-sensitive.
从-
呼叫时
Glue API
Name='job_name_value'
需要指定为第一个参数
Lambda API
JobName='job_name_value'
需要指定为第一个参数
参见下面的示例:
current_year_full = '2019'
current_month = '01'
current_day = '21'
current_hour = '01'
int_bucket_name = 'datascience-ca-input'
glue_job_name = os.getenv("job_name")
response = gl.start_job_run(
JobName = glue_job_name,
Arguments = {
'--intermediate_bucket_name': int_bucket_name,
'--year_partition_value': current_year_full,
'--month_partition_value': current_month,
'--date_partition_value': current_day,
'--hour_partition_value': current_hour } )
glueJobName = 'job_01'
for record in event['Records']:
print(record['s3']['object']['key'])
file_name = unquote_plus(record['s3']['object']['key'])
client = boto3.client('glue')
print("Job is going to start")
arguments = {
'--source': 'abc',
'--bucket_name': 'test',
'--folder_name': 'abc-2',
'--file_name': file_name,
}
response = client.start_job_run(JobName = 'job_01', Arguments=arguments)
return {
'statusCode': 200,
'body': json.dumps('Glue Trigger Lambda!' )
}
我们需要在触发 Glue 作业时将 4 个参数从 AWS Lambda 传递到 AWS Glue 作业。
response = client.start_job_run(JobName = 'my_test_Job',
Arguments = {
'--yr_partition_val': 2017,
'--mon_partition_val': 05,
'--date_partition_val': 25,
'--hour_partition_val': 07 } )
Glue 需要捕获这 4 个参数才能在 pyspark 胶水代码中进一步处理。
我试过在胶水中使用下面的代码来捕捉参数:
import sys
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv,
['JOB_NAME',
'yr_partition_val',
'mon_partition_val',
'date_partition_val',
'hour_partition_val'])
但出现错误:
self.error(_('argument %s is required') % name)
awsglue.utils.GlueArgumentError: argument --JobName is required
有人可以帮忙吗?
AWS says
'--JOB_NAME'
is internal to Glue and should not be set. Also, the arguments are case-sensitive.
从-
呼叫时Glue API
Name='job_name_value'
需要指定为第一个参数
Lambda API
JobName='job_name_value'
需要指定为第一个参数
参见下面的示例:
current_year_full = '2019'
current_month = '01'
current_day = '21'
current_hour = '01'
int_bucket_name = 'datascience-ca-input'
glue_job_name = os.getenv("job_name")
response = gl.start_job_run(
JobName = glue_job_name,
Arguments = {
'--intermediate_bucket_name': int_bucket_name,
'--year_partition_value': current_year_full,
'--month_partition_value': current_month,
'--date_partition_value': current_day,
'--hour_partition_value': current_hour } )
glueJobName = 'job_01'
for record in event['Records']:
print(record['s3']['object']['key'])
file_name = unquote_plus(record['s3']['object']['key'])
client = boto3.client('glue')
print("Job is going to start")
arguments = {
'--source': 'abc',
'--bucket_name': 'test',
'--folder_name': 'abc-2',
'--file_name': file_name,
}
response = client.start_job_run(JobName = 'job_01', Arguments=arguments)
return {
'statusCode': 200,
'body': json.dumps('Glue Trigger Lambda!' )
}