W. Stephens W. Stephens - 7 months ago 132
JSON Question

Upload to Bigquery from python

I have a Python script that is downloading data from firebase, manipulating it and then dumping it into a JSON file. I can upload it to BigQuery through the command line, but now I want to put some code into the Python script to have it all done in one.

Here is the code I have so far.

import json
from firebase import firebase

firebase = firebase.FirebaseApplication('<redacted>')
result = firebase.get('/connection_info', None)
id_keys = map(str, result.keys())

#with open('result.json', 'r') as w:
# connection = json.load(w)

with open("w.json", "w") as outfile:
for id in id_keys:
json.dump(result[id], outfile, indent=None)
outfile.write("\n")

Answer Source

To load a JSON file with the google-cloud-bigquery Python library, use the Client.load_table_from_file() method.

bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset('mydataset')
table = dataset.table('mytable')

with open(source_file_name, 'rb') as source_file:
    # This example uses JSON, but you can use other formats.
    # See https://cloud.google.com/bigquery/loading-data
    job_config = bigquery.LoadJobConfig()
    job_config.source_format = 'NEWLINE_DELIMITED_JSON'
    job = client.load_table_from_file(
        source_file, table, job_config=job_config)

From the code example at: https://github.com/GoogleCloudPlatform/google-cloud-python/blob/5f059f006b655970b1ef12977146c64bc9b60894/docs/bigquery/snippets.py#L379-L392

Edit: the way you upload to a table has change since version 0.28.0 of the Python library. Below is the way to do it in 0.27 and earlier.

To load a JSON file with the google-cloud-bigquery Python library, use the Table.upload_from_file() method.

bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset('mydataset')
table = dataset.table('mytable')

# Reload the table to get the schema.
table.reload()

with open(source_file_name, 'rb') as source_file:
    # This example uses JSON, but you can use other formats.
    # See https://cloud.google.com/bigquery/loading-data
    job = table.upload_from_file(
        source_file, source_format='NEWLINE_DELIMITED_JSON')

From the code example at: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/bigquery/cloud-client/load_data_from_file.py

Note: You'll have to create the table first and specify the schema (also can be done with the Python library). Unfortunately the schema autodetect feature is not yet supported by the client libraries: https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2926

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download