I have a Python script that is downloading data from firebase, manipulating it and then dumping it into a JSON file. I can upload it to BigQuery through the command line, but now I want to put some code into the Python script to have it all done in one.
Here is the code I have so far.
import json
from firebase import firebase
firebase = firebase.FirebaseApplication('<redacted>')
result = firebase.get('/connection_info', None)
id_keys = map(str, result.keys())
#with open('result.json', 'r') as w:
# connection = json.load(w)
with open("w.json", "w") as outfile:
for id in id_keys:
json.dump(result[id], outfile, indent=None)
outfile.write("\n")
To load a JSON file with the google-cloud-bigquery
Python library, use the Client.load_table_from_file() method.
bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset('mydataset')
table = dataset.table('mytable')
with open(source_file_name, 'rb') as source_file:
# This example uses JSON, but you can use other formats.
# See https://cloud.google.com/bigquery/loading-data
job_config = bigquery.LoadJobConfig()
job_config.source_format = 'NEWLINE_DELIMITED_JSON'
job = client.load_table_from_file(
source_file, table, job_config=job_config)
From the code example at: https://github.com/GoogleCloudPlatform/google-cloud-python/blob/5f059f006b655970b1ef12977146c64bc9b60894/docs/bigquery/snippets.py#L379-L392
Edit: the way you upload to a table has change since version 0.28.0 of the Python library. Below is the way to do it in 0.27 and earlier.
To load a JSON file with the google-cloud-bigquery
Python library, use the Table.upload_from_file()
method.
bigquery_client = bigquery.Client()
dataset = bigquery_client.dataset('mydataset')
table = dataset.table('mytable')
# Reload the table to get the schema.
table.reload()
with open(source_file_name, 'rb') as source_file:
# This example uses JSON, but you can use other formats.
# See https://cloud.google.com/bigquery/loading-data
job = table.upload_from_file(
source_file, source_format='NEWLINE_DELIMITED_JSON')
From the code example at: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/bigquery/cloud-client/load_data_from_file.py
Note: You'll have to create the table first and specify the schema (also can be done with the Python library). Unfortunately the schema autodetect feature is not yet supported by the client libraries: https://github.com/GoogleCloudPlatform/google-cloud-python/issues/2926