追赶实例之间的气流延迟
Airflow delay between catchup instances
我有以下 dag 设置以 运行 从 2015 年开始追赶。对于每个执行日期,任务实例在一分钟内完成。然而,第二天的任务只在 5 分钟后开始windows。例如。 10:00 AM、10:05 AM、10:10 AM 等。我没有看到为任务实例指定的 5 分钟间隔。如何修改 dag 以在前一个实例完成后立即触发?
我正在使用 Airflow 版本 1.9.0
default_args = {
'owner': 'ssnehalatha',
'email': ['ssnehalatha@metromile.com'],
'depends_on_past': False,
'start_date': datetime(2015, 1, 1),
'on_failure_callback': jira_failure_ticket,
'trigger_rule': 'all_done',
'retries': 1,
'pool': 'python_sql_pool'
}
dag = DAG('daily_dag',
schedule_interval='15 1 * * 0,1,2,3,4,5',
default_args=default_args,
dagrun_timeout=timedelta(hours=24),
catchup=True)
如果我没记错的话,这跟airflow.cfg
里的scheduler设置有关。
[scheduler]
# The scheduler constantly tries to trigger new tasks (look at the
# scheduler section in the docs for more information). This defines
# how often the scheduler should run (in seconds).
scheduler_heartbeat_sec = 60
编辑
你提到的两个参数的文档(来自https://github.com/apache/incubator-airflow/blob/master/UPDATING.md):
min_file_process_interval After how much time should an updated DAG be picked up from the filesystem.
dag_dir_list_interval The frequency with which the scheduler should relist the contents of the DAG directory. If while developing +dags, they are not being picked up, have a look at this number and decrease it when necessary.
在我看来,它们更适合检测更改的和新的 DAG files,而不是执行 tasks.
我有以下 dag 设置以 运行 从 2015 年开始追赶。对于每个执行日期,任务实例在一分钟内完成。然而,第二天的任务只在 5 分钟后开始windows。例如。 10:00 AM、10:05 AM、10:10 AM 等。我没有看到为任务实例指定的 5 分钟间隔。如何修改 dag 以在前一个实例完成后立即触发? 我正在使用 Airflow 版本 1.9.0
default_args = {
'owner': 'ssnehalatha',
'email': ['ssnehalatha@metromile.com'],
'depends_on_past': False,
'start_date': datetime(2015, 1, 1),
'on_failure_callback': jira_failure_ticket,
'trigger_rule': 'all_done',
'retries': 1,
'pool': 'python_sql_pool'
}
dag = DAG('daily_dag',
schedule_interval='15 1 * * 0,1,2,3,4,5',
default_args=default_args,
dagrun_timeout=timedelta(hours=24),
catchup=True)
如果我没记错的话,这跟airflow.cfg
里的scheduler设置有关。
[scheduler]
# The scheduler constantly tries to trigger new tasks (look at the
# scheduler section in the docs for more information). This defines
# how often the scheduler should run (in seconds).
scheduler_heartbeat_sec = 60
编辑
你提到的两个参数的文档(来自https://github.com/apache/incubator-airflow/blob/master/UPDATING.md):
min_file_process_interval After how much time should an updated DAG be picked up from the filesystem.
dag_dir_list_interval The frequency with which the scheduler should relist the contents of the DAG directory. If while developing +dags, they are not being picked up, have a look at this number and decrease it when necessary.
在我看来,它们更适合检测更改的和新的 DAG files,而不是执行 tasks.