Ways to Backup data in BigQuery
Disclaimer: Views, thoughts, and opinions expressed in the blog belong solely to the author, and not necessarily to the author’s employer, organisation, committee or other group or individual
Organizations increasingly look at BigQuery as a single source of truth for their data. BigQuery is increasingly becoming the metaphor for a DataWarehouse. Data coming from different Lines of Business are finding its way to BigQuery for it to be harnessed and monetized. BigQuery is like a ‘Data Bank’. As with all Banks which have strict governance and backup mechanisms, BigQuery does offer ready to use Back-up Solutions. In this article i will explore the multiple scenarios and options available for backing up data in those scenarios.
Scenario 1: Few key tables used by end users for Analytics needs has to be backed up . Typically this arises when a DML statements (Insert, Update, Delete, Merge) needs to be run for an ad hoc request and developers just want to be insured!
Solution : Use the Table Snapshot option (only reads and not writes)in BigQuery. More info on table snapshots here.

Use the Table Clone option (both reads and writes) in BigQuery

Scenario 2 : When developers run that ‘DELETE’ statement without a where clause! ( In general, when a DML statement was run which they should not have run at all :-)
Solution : BigQuery provides time travel options to recover data as of a any day in the past 7 days.
CREATE OR REPLACE TABLE `ascendant-shade-245307.bqml_tutorial.analytics_session_data`
AS
SELECT *
FROM `ascendant-shade-245307.bqml_tutorial.analytics_session_data`
FOR SYSTEM TIME AS OF
TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
Scenario 3: Large Number of tables which change regularly need to be backed up as part of a regular backup plan. These tables also need to be archived beyond a certain time.
Solution : The Tables need to be backed up in Google Cloud Storage and archived there using the lifecycle management. We can use the Cloud Composer ( Managed Apache Airflow) and use the bigquery_to_gcs operator for this. Tables that need to be backed up can be passed as a list and the operator bigquery_to_gcs operator called in a sequential or parallel mode. the DAG can be scheduled for auto-run. More details on the operator can be found here.

Looking forward to update this page as i stumble along more such scenarios.