d-_-b d-_-b - 6 months ago 32
SQL Question

BigQuery select alias using regex_extract_all in standard mode

I'm unable to reference a

SELECT
alias in BigQuery (standard mode).

Trying to do this query:

SELECT
REGEXP_EXTRACT_ALL(text,
r"(<div \w+>)") AS matches
FROM
regex.test
WHERE
matches IS NOT NULL


Here are steps to reproduce.

bq mk regex
bq mk -t regex.test id:integer,text:string
echo '{"id":1, "text":"<div a>"}' | bq insert regex.test
echo '{"id":2, "text":"<div b>"}' | bq insert regex.test
echo '{"id":3, "text":"<div>"}' | bq insert regex.test

bq query --use_legacy_sql=false "select REGEXP_EXTRACT_ALL(text, r\"(<div \w+>)\") AS matches FROM regex.test WHERE id IS NOT NULL"

+--------------+
| matches |
+--------------+
| [u'<div b>'] |
| [] |
| [u'<div a>'] |
+--------------+


When I try to reference the
matches
alias, I see an error:

bq query --use_legacy_sql=false "select REGEXP_EXTRACT_ALL(text, r\"(<div \w+>)\") AS matches FROM regex.test WHERE matches IS NOT NULL"
Error in query string: Error processing job 'myname': Unrecognized name:
matches


I am unable to reference the alias
matches
, and am unable to filter those results
WHERE matches IS NOT NULL
.

Does anyone know what I'm doing incorrectly here?

Thanks!

Answer

Even in BQ, you can't use a column alias in the where clause. Just use a subquery:

SELECT t.*
FROM (SELECT REGEXP_EXTRACT_ALL(text, r"(<div \w+>)") AS matches
      FROM regex.test
     ) t
WHERE ARRAY_LENGTH(matches) > 0