S.Mohsen sh S.Mohsen sh - 2 months ago 10
SQL Question

Using ARRAY_AGG() with all null values in input

I want to select distinct values of columns for each user in my table (in Google BigQuery). I've thought about using

ARRAY_AGG()
like:

SELECT user_id, ARRAY_AGG(DISTINCT field1) AS f1, ARRAY_AGG(DISTINCT field2) AS f2
FROM t GROUP BY user_id


But because for some
user_id
s all values in
field1
or
field2
are null, I get this error message:
Array 'f1' cannot have a null element


I wonder if there is a workaround to avoid this error or maybe a different way to achieve the result without using
ARRAY_AGG()

Answer

From https://cloud.google.com/bigquery/sql-reference/data-types#array-type

BigQuery raises an error if query result has ARRAYs which contain NULL elements, although such ARRAYs can be used inside the query.

your query is ok in a temporary query but not ok in end query result; so a workaround is you may define your query as a tmp table and filter out NULL values before giving end results:

WITH tmp AS (SELECT user_id,
                    ARRAY_AGG(DISTINCT field1) AS f1,
                    ARRAY_AGG(DISTINCT field2) AS f2
FROM t GROUP BY user_id)

SELECT user_id,
    ARRAY(SELECT el FROM UNNEST(f1) AS el WHERE el IS NOT NULL) AS f1,
    ARRAY(SELECT el FROM UNNEST(f2) AS el WHERE el IS NOT NULL) AS f2
 FROM tmp

I've met same problems when porting over some Postgres SQL into BigQuery, a more elegant solution is FILTER clause on aggregation functions,

https://www.postgresql.org/docs/current/static/sql-expressions.html

like ARRAY_AGG(f1 FILTER WHEN f1 IS NOT NULL) which is not available in BigQuery which I really hope they can implement it