Merge on BigQuery tables with Nested & Repeated fields

Ravi Manjunatha
4 min readFeb 9, 2022

Disclaimer: Views, thoughts, and opinions expressed in the blog belong solely to the author, and not necessarily to the author’s employer, organisation, committee or other group or individual

Nested & Repeated !

Denormalization is a key feature of BigQuery. It reduces table joins and speeds us the query execution. Some call it a table within a table. A denormalized BigQuery table looks like below,

The metadata for the table looks like below, there are numerous articles explaining the ‘RECORD’ type fields in BigQuery and their properties.

So, i will not particularly dwell on that in this article. I will discuss the scenarios when new records need to be added to such a table or existing records need to be updated which are of type ‘RECORD’. Performing these merge operations require a bit of ‘unlearning the RDBMS way’ :-)

I have considered a scenario of a library which maintains a table of its users, with the books they have read and the promos they are eligible for.

The base table can be called as ‘libraryuser’. We can create this table from the BigQuery Console, sharing the create table statement below,

CREATE TABLE `ascendant-shade-245307.demo.libraryuser`
(
id INT64,
Name STRING,
Book ARRAY<STRUCT<id INT64, Name STRING, Author STRING>>,
Promo ARRAY<STRUCT<id INT64, Code STRING, Name STRING>>
);

To insert test values to the table, please use the below scripts,

insert into `ascendant-shade-245307.demo.libraryuser` values
(
1000001 ,
‘Ram’,
[
STRUCT(1 AS id,’Ramayan’ AS name, ‘Valmiki’ AS Author) ,
STRUCT(2 AS id,’Mahabharatha’ AS name, ‘Vyasa’ AS Author)
]
,
[STRUCT(1 as id,’app50' AS code, ‘diwali’ AS name),
STRUCT(2 as id,’app10' AS code, ‘dasara’ AS name)]
)
insert into `ascendant-shade-245307.demo.libraryuser` values
(
1000002 ,
‘Raj’,
[
STRUCT(3 as id,’Harrypoter’ AS name, ‘JK’ AS Author) ,
STRUCT(4 as id,’Junglebook’ AS name, ‘Kipling’ AS Author)
]
,
[STRUCT(1 as id,’app50' AS code, ‘diwali’ AS name),
STRUCT(2 as id,’app10' AS code, ‘dasara’ AS name)]
)

The table output looks like below,

libraryuser

Now let us consider the following scenarios for altering the table contents,

Scenario1 : New user, say ‘Shyam’ needs to be added to the table.

Scenario2: ‘Ram’ the existing user has started reading a new book ‘Harrypoter’.

Scenario3:Author name for the book ‘Harrypoter’ has been changed from ‘JK’ to ‘Rowling’.

As can be observed, a new row has to be inserted for Scenario1, an additional record under ‘Ram’ has to be added for Scenario2 and 2 existing records need to be updated for ‘author’ field from ‘JK’ to ‘Rowling’.

These delta records need to be loaded to a ‘temp’ table. Let us create the temp table with ,

create or replace table `ascendant-shade-245307.demo.libraryuser_temp`as select * from `ascendant-shade-245307.demo.libraryuser` where 1 = 2.

Let us insert the records to be inserted/updated in this table,

insert into `ascendant-shade-245307.demo.libraryuser_temp` values
(
1000003 ,
‘Shyam’,
[STRUCT(6 as id,’Alchemy’ AS name, ‘Paulo’ AS Author) ,
STRUCT(7 as id,’Habits’ AS name, ‘Stephen’ AS Author)]
,
[STRUCT(1 as id,’app50' AS code, ‘diwali’ AS name),
STRUCT(2 as id,’app10' AS code, ‘dasara’ AS name)]
)
insert into `ascendant-shade-45307.demo.libraryuser_temp`(id,Name,Book) values
(
1000001 ,
‘Ram’,
[STRUCT(3 AS id,’HarryPorter’ AS name, ‘Rowling’ AS Author)]
)
insert into `ascendant-shade-45307.demo.libraryuser_temp`(id,Name,Book) values
(
1000002 ,
‘Raj’,
[STRUCT(3 as id,’Harrypoter’ AS name, ‘Rowling’ AS Author)]
)

libraryuser_temp

To insert/update the parent table ‘libraryuser’ with ‘libraryuser_temp’ table, we will follow a 2 step process.

Step 1: Insert the new records

INSERT INTO `ascendant-shade-245307.demo.libraryuser`
SELECT p.ID,p.NAME,p.BOOK,p.PROMO FROM
`ascendant-shade-245307.demo.libraryuser` s right OUTER JOIN
`ascendant-shade-245307.demo.libraryuser_temp` p
ON P.ID = S.ID WHERE s.id is null

Step 2 : Update the existing records with new books and new authors. This will require un nesting the tables and then performing joins and merge or update on the parent table.

MERGE `ascendant-shade-245307.demo.libraryuser` BP
USING
(
select
ifnull(r.id,l.id) AS id,
ifnull(R.name,l.name) as name,
ARRAY_AGG(STRUCT(ifnull(r.book_id,l.book_id),ifnull(R.BOOK_NAME,l.book_name),ifnull(R.
author,l.author))) AS BOOK
from
(
select
p.id,
p.Name,
b.id as Book_id,
b.Name as Book_Name,
b.Author
from
`ascendant-shade-245307.demo.libraryuser` p,
unnest(Book)b
) L
full join
(select
p.id,
p.Name,
b.id as Book_id,
b.Name as Book_Name,
b.Author
from
`ascendant-shade-245307.demo.libraryuser_temp` p,
unnest(Book)b
) R on l.id = R.id and l.book_id = R.Book_id
GROUP BY ifnull(r.id,l.id), ifnull(R.NAME,l.name)
) BS ON BP.id = BS.id WHEN MATCHED THEN UPDATE SET BP.book = BS.book

While there can be numerous other scenarios for table updates involving Nested & Repeated fields, bottomline is to use the Array and Struct functions on the nested fields along with unnesting them. For more details on array functions please check.

PS : Thank you Prasanna ,Sourabh & Shailesh for the guidance

--

--