Shehabix Shehabix - 4 months ago 12
SQL Question

How to select matches in the same table in BigQuery without Join

Please notice this is not MySQL I am talking about BigQuery

I need to perform the following operation on Google Big-Query

SELECT * FROM searchable.keyword AS kw1,
(SELECT keyword From searchable.keyword
GROUP BY keyword ORDER BY keyword DESC) AS kw2
WHERE kw1.keyword CONTAINS(kw2.keyword)
# or even kw1.keyword = kw2.keyword <-- I don't need this now but it doesn't work as well

It's giving error that kw2 is unknown and kw1 as well

Notice: I can perform this query with JOIN then filter, but the problem is that the table has more than 450K entries, Such join will results in a huge amount 450K square, as the join condition is the parent ID, and the 450K have the same parent ID, and BigQuery doesn't allow you to add Anything in the join condition except Table1.field = Table2.field


There are two issues... the first is that bigquery only supports equi-join -- that is, you can only join on exact equality. The second is that comma is UNION ALL not JOIN (this is unfortunate legacy behavior) To do this, you'll need to compute the full cross join:

(SELECT keyword, 1 as cross FROM searchable.keyword) AS kw1 
    (SELECT keyword, 1 as cross FROM searchable.keyword GROUP BY keyword) 
ON kw1.cross = kw2.cross
WHERE kw1.keyword CONTAINS(kw2.keyword)

Of course, this cross join is going to explode for a reasonably large-sized table.