Spark on EMR Data Pipeline with Apache Airlfow

Large enterprises with big on-premise footprint of Hadoop clusters powering Spark, prefer Amazon EMR as their cloud migration solution. This lift and shift approach ensures, the developers and the end user community do not get hazzled by new ways of doing things…

Launch Apache Airflow without provisioning or managing servers on AWS in less than 30 mins!

(This artcile assumes you have a working knoweldge of Apache Airflow. Keep a sample / hello world kind of Airflow Dag ready before proceeding with the setup)

Amanzon Managed Worflows for Apache Airflow

While Apache Airflow is perhaps the most used…

Best Practices for designing & developing Data Pipelines using Apache Airflow

Best Practices for Apache Airflow

Apache Airflow is a thing of beauty when it comes to designing and developing Data Pipelines especially for Big Data loads. Data Engineering has now transcended to Data Ops and this space has more customized tool offerings for just…

( This article assumes readers have working knowledge of Apache Airflow and Dockers)

When it comes to building data pipelines Apache Airflow works like a charm. With its robust set of operators Data Engineers can now integrate a wide variety of external data sources to their Internal data systems. …

Renaissance in Indian IT Education

Every now and then we come across those studies and reports which find only about 10% of Indian Engineering Graduates to be having employable skills. …

Data Build Tool (DBT) : A niche SQL based Data Transformation tool for the Modern Data Warehouse

The data engineering landscape today is like a Kaleidoscope , with multitude of tools vying for their space. The Big 3 cloud players AWS, Azure and GCP have been pitching their specific products…

Which one ??

When i started my career in BI more than a decade back, reporting consisted of two major players SAP Business Objects and IBM Cognos. ‘Reporting’ was the general phrase and ‘Dashboards’ and ‘Stories’ weren't much in use. …

Recently i implemented an in house Real Time Data Alert solution, which would provide real time insights of the oil fields to the operators. I used AWS Big Data Services to develop this solution.

While exploring the solution design and approach, i realized that the solution with little tweaks could…

How to clear Tableau Desktop Certified Associate Exam : Tips & Alerts

I recently cleared my Tableau Desktop Certified Qualified Associate Exam on 9th of May 2020 and i just managed to scrape through the exam with a passing score of 76%. Had i cleared the exam with a good…

Will Federated queries make this data pipelines obsolete!

After reading and reviewing multiple article and architecture reviews about AWS data pipelines involving Redshift, i could see a common need across all of them, the ability to stage data from S3 or Dynamo DB or RDS in Redshift and then build the warehouse tables on top of it.

Generally…

Ravi Manjunatha

Data enthusiast with passion for building enterprise data solutions

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store