Richard Richard - 1 month ago 5
Python Question

Python BigQuery API: how to get data asynchronously?

I am getting started with the BigQuery API in Python, following the documentation.

This is my code, adapted from an example:

credentials = GoogleCredentials.get_application_default()
bigquery_service = build('bigquery', 'v2', credentials=credentials)

try:
query_request = bigquery_service.jobs()
query_data = {
'query': (
'SELECT * FROM [mytable] LIMIT 10;"
)
}
query_response = query_request.query(
projectId=project_id,
body=query_data).execute()
for row in query_response['rows']:
print('\t'.join(field['v'] for field in row['f']))


The problem I'm having is that I keep getting the response:

{u'kind': u'bigquery#queryResponse',
u'jobComplete': False,
u'jobReference': {u'projectId': 'myproject', u'jobId': u'xxxx'}}


So it has no
rows
field. Looking at the docs, I guess I need to take the
jobId
field and use it to check when the job is complete, and then get the data.

The problem I'm having is that the docs are a bit scattered and confusing, and I don't know how to do this.

I think I need to use this method to check the status of the job, but how do I adapt it for Python? And how often should I check / how long should I wait?

Could anyone give me an example?

Answer

There is code to do what you want here.

If you want more background on what it is doing, check out Google BigQuery Analytics chapter 7 (the relevant snippet is available here.)

TL;DR:

Your initial jobs.query() call is returning before the query completes; to wait for the job to be done you'll need to poll on jobs.getQueryResults(). You can then page through the results of that call.