S.Mohsen sh S.Mohsen sh - 2 months ago 11
SQL Question

Count number of occcurance of a field for each ID

Consider a table consisting of two columns:

device_id, connection_type
like:

device_id | connection_types
-----------|------------------
111 | wifi
111 | LTE
222 | wifi
111 | LTE
111 | LTE
111 | cable
....


For each device, I want to get values of field
connection_type
along with number of their appearances. For example two rows for output will look like:

device_id | connection_types | connection_count
-----------|------------------|-------------------
111 | wifi | 3
| LTE | 5
| cable | 2
-----------|------------------|-------------------
222 | wifi | 4
| cable | 6


Indicating that device 111, is connected with wifi in 3 rows,
LTE in 5 and cable in 2 of rows it appears.
How can I calculate this using SQL in Google BigQuery (using either its legacy SQL or standard SQL dialect).

It is good to point out that (as also suggested by @kbball) grouping by both
device_id
,
connection_type
does something similar but I cant convert that output to the sample answer I provided above.

I've considered using
nest()
or
array_agg()
but none of them works. I can't use
nest()
simultaneously on both of columns
connection_types
and
connection_count
. and because there is no data for some devices,
array_agg()

results in error (that its output can not be empty).

Answer

If you want to get for every device id an array of (connection_types, connection_count) pairs, the following query in Standard SQL will do it:

SELECT device_id, ARRAY_AGG(STRUCT(connection_types, connection_count)) FROM (
SELECT device_id, connection_types, COUNT(*) connection_count
FROM t
GROUP BY device_id, connection_types) 
GROUP BY device_id