Launch Apache Airflow without provisioning or managing servers on AWS in less than 30 mins!
(This artcile assumes you have a working knoweldge of Apache Airflow. Keep a sample / hello world kind of Airflow Dag ready before proceeding with the setup)
While Apache Airflow is perhaps the most used orchestration tool in the Data Engineer landscape, Data Engineers (DEs) often find it a challenge to install and configure Apache Airflow. More so, when it has to be done in a multi-node set up.
In this regard, AWS Managed Apache Airflow comes as a great shot in the arm. For DEs and start-up organizations who would like to quickly Luanch Airflow and build pipelines, Managed Worfklow for Apache Airflow (MWAA) is increasigly becoming the go-to tool.
In this article, i will share the steps on luanching and configuring Managed Workflow for Apache Airflow (MWAA) on airflow,
- Once you log in to the AWS console, under the services section, type airflow, you should be able to find the Managed Apache Airflow icon.
2. Click on ‘create environment’
3. We will now start the 3 step process of luanching Apache Airflow,
4. In the S3 bucket section, sepecify the bucket and folder where the dags (.py) file will be kept.
5. For the Advanced settins configuration, we will go with the default options where available, select ‘Create MWAA VPC’ option and change the Web server access from Private Network to Public Network.
6. The VPC stack creation will take about 5–6 minutes, one can see the different components being created,
7. Airflow Cluster : for trails and pocs ‘mw1.small’ with 1 worker count should be enough,
8. We can enable logs as shown below,
9. Create a new role for permission,
10. Review the steps and click on create enviornment, you will get this screen, it usualy takes about 20–30 minutes for the airflow enviornment to be created.
11. Once the Airflow enviornment is created, the status changes to ‘Available’,
12. Click on the ‘Open Airflow UI’ to access the Airflow Enviornment,
As and when you save the .py files in the dags folder under the S3 bucket configured during the setting up of the environment, dags start to appear here in the UI.
Please make sure you delete the environment as soon as your poc or case study is complete. Also delete the NAT Gateways & VPCs created in the process for the Airflow environment. I strongly suggest that, before you try this, have your airflow code ready.
In the next article in this series, i will discuss how we can luanch AWS EMR from MWAA and run spark job on top of it.