Open in app
Home
Notifications
Lists
Stories

Write
Ravi Manjunatha
Ravi Manjunatha

Home

Published in Google Cloud - Community

·3 days ago

Serverless Spark ML pipeline in GCP

In the previous article on the Serverless Spark series we had described how a sample ETL pipeline can be developed. In this article, we will extend this to further see , how a ML pipeline can be developed and orchestrated the Serverless Spark way. We will build a simple Regression…

Serverless Spark

6 min read

Serverless Spark ML pipeline in GCP
Serverless Spark ML pipeline in GCP

Published in Google Cloud - Community

·5 days ago

Serverless Spark ETL Pipeline Orchestrated by Airflow on GCP

A Big Data Spark engineer spends on an average only 40% on actual data or ml pipeline development activity. Most of their time is often devoted to managing the clusters or optimizing the spark application variables. As an alternative to this monolith set up of clusters is the cloud based…

Google Dataproc

5 min read

Serverless Spark ETL Pipeline Orchestrated by Airflow on GCP
Serverless Spark ETL Pipeline Orchestrated by Airflow on GCP

Published in Google Cloud - Community

·Mar 27

Multi-Cloud Analytics with BigQuery Omni : No time to load !

With increasing number of organizations adopting a Multi-Cloud Strategy, data movement has been a point of contention for the Analytics teams. Data sprawl and the egress costs of the host cloud providers are some serious side effects of this. Independent Software Vendors (ISVs) who develop a multi-tenant Data Platform for…

Google Cloud Platform

7 min read

Multi-Cloud Analytics with BigQuery Omni : No time to load !
Multi-Cloud Analytics with BigQuery Omni : No time to load !

Mar 4

Ways to Backup data in BigQuery

Disclaimer: Views, thoughts, and opinions expressed in the blog belong solely to the author, and not necessarily to the author’s employer, organisation, committee or other group or individual Organizations increasingly look at BigQuery as a single source of truth for their data. BigQuery is increasingly becoming the metaphor for a…

Bigquery

3 min read

Ways to Backup data in BigQuery
Ways to Backup data in BigQuery

Feb 9

Merge on BigQuery tables with Nested & Repeated fields

Disclaimer: Views, thoughts, and opinions expressed in the blog belong solely to the author, and not necessarily to the author’s employer, organisation, committee or other group or individual Denormalization is a key feature of BigQuery. It reduces table joins and speeds us the query execution. Some call it a table…

Bigquery

4 min read

Merge on BigQuery tables with Nested & Repeated fields
Merge on BigQuery tables with Nested & Repeated fields

Oct 18, 2021

Spark on EMR Data Pipeline with Apache Airlfow

Spark on EMR Data Pipeline with Apache Airlfow Large enterprises with big on-premise footprint of Hadoop clusters powering Spark, prefer Amazon EMR as their cloud migration solution. This lift and shift approach ensures, the developers and the end user community do not get hazzled by new ways of doing things…

Airflow

6 min read

Spark on EMR Data Pipeline with Apache Airlfow
Spark on EMR Data Pipeline with Apache Airlfow

Oct 13, 2021

Launch Apache Airflow without provisioning or managing servers on AWS in less than 30 mins!

Launch Apache Airflow without provisioning or managing servers on AWS in less than 30 mins! (This artcile assumes you have a working knoweldge of Apache Airflow. Keep a sample / hello world kind of Airflow Dag ready before proceeding with the setup) While Apache Airflow is perhaps the most used…

Airflow

4 min read

Launch Apache Airflow without provisioning or managing servers on AWS in less than 30 mins!
Launch Apache Airflow without provisioning or managing servers on AWS in less than 30 mins!

Jul 3, 2021

Best Practices for designing & developing Data Pipelines using Apache Airflow

Best Practices for designing & developing Data Pipelines using Apache Airflow Apache Airflow is a thing of beauty when it comes to designing and developing Data Pipelines especially for Big Data loads. Data Engineering has now transcended to Data Ops and this space has more customized tool offerings for just…

Apache Airflow

4 min read


Apr 17, 2021

Dockerize Airflow Data Pipelines with Azure Containers

( This article assumes readers have working knowledge of Apache Airflow and Dockers) When it comes to building data pipelines Apache Airflow works like a charm. With its robust set of operators Data Engineers can now integrate a wide variety of external data sources to their Internal data systems. …

Towards Data Science

3 min read

Dockerize Airflow Data Pipelines with Azure Containers
Dockerize Airflow Data Pipelines with Azure Containers

Apr 13, 2021

Renaissance in Indian Technical Education

Renaissance in Indian IT Education Every now and then we come across those studies and reports which find only about 10% of Indian Engineering Graduates to be having employable skills. …

Upgrad

5 min read

Renaissance in Indian Technical Education
Renaissance in Indian Technical Education
Ravi Manjunatha

Ravi Manjunatha

Data Analytics Specialist , Google

Following
  • cengkuru michael

    cengkuru michael

  • Areeba Merriam

    Areeba Merriam

  • Magnimind

    Magnimind

  • Michael Galarnyk

    Michael Galarnyk

  • Rebecca Vickery

    Rebecca Vickery

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Knowable