Ways to Backup data in BigQuery

Ravi Manjunatha
3 min readMar 4, 2022

--

Disclaimer: Views, thoughts, and opinions expressed in the blog belong solely to the author, and not necessarily to the author’s employer, organisation, committee or other group or individual

Organizations increasingly look at BigQuery as a single source of truth for their data. BigQuery is increasingly becoming the metaphor for a DataWarehouse. Data coming from different Lines of Business are finding its way to BigQuery for it to be harnessed and monetized. BigQuery is like a ‘Data Bank’. As with all Banks which have strict governance and backup mechanisms, BigQuery does offer ready to use Back-up Solutions. In this article i will explore the multiple scenarios and options available for backing up data in those scenarios.

Scenario 1: Few key tables used by end users for Analytics needs has to be backed up . Typically this arises when a DML statements (Insert, Update, Delete, Merge) needs to be run for an ad hoc request and developers just want to be insured!

Solution : Use the Table Snapshot option (only reads and not writes)in BigQuery. More info on table snapshots here.

Use the Table Clone option (both reads and writes) in BigQuery

Scenario 2 : When developers run that ‘DELETE’ statement without a where clause! ( In general, when a DML statement was run which they should not have run at all :-)

Solution : BigQuery provides time travel options to recover data as of a any day in the past 7 days.

CREATE OR REPLACE TABLE `ascendant-shade-245307.bqml_tutorial.analytics_session_data`
AS
SELECT *
FROM `ascendant-shade-245307.bqml_tutorial.analytics_session_data`
FOR SYSTEM TIME AS OF
TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)

Scenario 3: Large Number of tables which change regularly need to be backed up as part of a regular backup plan. These tables also need to be archived beyond a certain time.

Solution : The Tables need to be backed up in Google Cloud Storage and archived there using the lifecycle management. We can use the Cloud Composer ( Managed Apache Airflow) and use the bigquery_to_gcs operator for this. Tables that need to be backed up can be passed as a list and the operator bigquery_to_gcs operator called in a sequential or parallel mode. the DAG can be scheduled for auto-run. More details on the operator can be found here.

Looking forward to update this page as i stumble along more such scenarios.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Ravi Manjunatha
Ravi Manjunatha

Written by Ravi Manjunatha

Gen AI & Data Analytics Specialist , Google

No responses yet