agondiken agondiken - 1 year ago 118
JSON Question

How to use Bigquery JSON functions in Google Cloud Datalab

I am calling Bigquery within Google Cloud Datalab notebooks and I want to use the JSON functions that can be used in BigQuery. However it turns out, because JSON functions use "$" as a reference to the segment within string, while Cloud Datalab uses "$" to refer to global variables; they conflict and give errors.

Example (not reproducible as I couldn't find any JSON-like strings in samples)

%%sql --module events_query
SELECT JSON_EXTRACT_SCALAR(eventTypeParams, '$.restaurant-name") as str
FROM [foodit-prod:analytics.analytics_event]

When I run the thing:

events_run = bq.Query(events_query)
events = events_run.to_dataframe()

This is the error I receive:

ExceptionTraceback (most recent call last)
in ()
----> 1 events_run = bq.Query(events_query)
2 events = events_run.to_dataframe()

/usr/local/lib/python2.7/dist-packages/gcp/bigquery/_query.pyc in
init(self, sql, context, values, udfs, data_sources, **kwargs)
90 values = kwargs
---> 92 self._sql =, values, udfs)
94 # We need to take care not to include the same UDF code twice so we use sets.

/usr/local/lib/python2.7/dist-packages/gcp/data/_sql_module.pyc in
expand(sql, args, udfs)
127 """
128 sql, args = SqlModule.get_sql_statement_with_environment(sql, args)
--> 129 return _sql_statement.SqlStatement.format(sql._sql, args, udfs)

/usr/local/lib/python2.7/dist-packages/gcp/data/_sql_statement.pyc in
format(sql, args, udfs)
137 code = []
138 SqlStatement._find_recursive_dependencies(sql, args, code=code,
--> 139 resolved_vars=resolved_vars)
141 # Rebuild the SQL string, substituting just '$' for escaped $ occurrences,

/usr/local/lib/python2.7/dist-packages/gcp/data/_sql_statement.pyc in
_find_recursive_dependencies(sql, values, code, resolved_vars, resolving_vars)
81 # Get the set of $var references in this SQL.
---> 82 dependencies = SqlStatement._get_dependencies(sql)
83 for dependency in dependencies:
84 # Now we check each dependency. If it is in complete - i.e., we have an expansion

/usr/local/lib/python2.7/dist-packages/gcp/data/_sql_statement.pyc in
202 dependencies.append(variable)
203 elif dollar:
--> 204 raise Exception('Invalid sql; $ with no following $ or identifier: %s.' % sql)
205 return dependencies

Exception: Invalid sql; $ with no following $ or identifier: SELECT
JSON_EXTRACT_SCALAR(eventTypeParams, "'\$'.restaurant-name") as str
FROM [foodit-prod:analytics.analytics_event].

I tried putting the $ sign in different quotes or escaping etc. None worked. Any solutions?

Answer Source

Please can you try the following?

    "{'book': { 
        'title':'Harry Potter'}}", 

or from your example,

%%sql --module events_query
SELECT JSON_EXTRACT_SCALAR(eventTypeParams, '$$.restaurant-name') as str
FROM [foodit-prod:analytics.analytics_event]
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download