The airflow web interface currently doesn’t show the configs passed to the job.Īlso, the month worth run will take longer time to run, so we change the config for the dag to have only a single instance running at a time ( max_active_runs=1). Note, its very important to understand that the trigger still considers this as a normal daily run, and the web interface for job monitoring will not have any information on if we had a month worth run or a daily run. Alternatively, The dates for range run can be set in Airflow Variables via the web ui, and the next run should pick them up.The range run date (which will not be very frequent) can run manually from command line trigger with custom configs.The daily builds always run for single day (typically today or yesterday).These configs can be used to populate our query with date ranges selectively. Airflow allows us to trigger job on the command line with custom configs. I was looking for a single trigger that can run over the entire day, just by modifying the query that runs over the data. Note, We already have airflow-backfill that has the range run capability by specifying start and end dates, but it internally just triggers the dag for every single date. This is particularly helpful in our case where we can use the same script for historic builds over past years, or for batch correction jobs that can recompute the same data over a month. How can we make this query also run over a range of dates such that we don’t do the daily trigger 30 times for all the days in month. Use caseĪ hive/spark query that runs daily in Airflow. While its not very difficult to do this, but I found really no straight documentation around this. Here is a small post on a very specific use case of Airflow.
0 Comments
Leave a Reply. |