user7070824 user7070824 - 1 month ago 13
Python Question

export json data to csv from mongodb

I am having the problem with missing field name in python script when exported data to csv from mongodb. type field name exists in first record, but it does not appear in the rest of records. how to write python script to give null value for type field if it does not exist.

the sample of Mongodb collection:

"stages": [
{
"interview": false,
"hmNotification": false,
"hmStage": false,
"type": "new",
"isEditable": false,
"order": 0,
"name": {
"en": "New"
},
"stageId": "51d1a2f4c0d9887b214f3694"
},
{
"interview": false,
"hmNotification": true,
"isEditable": true,
"order": 1,
"hmStage": true,
"name": {
"en": "Pre-Screen"
},
"stageId": "51f0078d7297363f62059699"
},
{
"interview": false,
"hmNotification": false,
"hmStage": false,
"isEditable": true,
"order": 2,
"name": {
"en": "Phone Screen"
},
"stageId": "51d1a326c0d9887721778eae"
}]


the sample of Python script:

import csv
cursor = db.workflows.find( {}, {'_id': 1, 'stages.interview': 1, 'stages.hmNotification': 1, 'stages.hmStage': 1, 'stages.type':1, 'stages.isEditable':1, 'stages.order':1,
'stages.name':1, 'stages.stageId':1 })
flattened_records = []
for stages_record in cursor:
stages_record_id = stages_record['_id']
for stage_record in stages_record['stages']:
flattened_record = {
'_id': stages_record_id,
'stages.interview': stage_record['interview'],
'stages.hmNotification': stage_record['hmNotification'],
'stages.hmStage': stage_record['hmStage'],
'stages.type': stage_record['type'],
'stages.isEditable': stage_record['isEditable'],
'stages.order': stage_record['order'],
'stages.name': stage_record['name'],
'stages.stageId': stage_record['stageId']}
flattened_records.append(flattened_record)


when run the python script, it shows keyerror:"type". please help me how to add the missing field name in the script.

Answer

When you're trying to fetch values that might not exist in a Python dictionary, you can use the .get() method of the dict class.

For instance, let's say you have a dictionary like this:

my_dict = {'a': 1, 
           'b': 2,
           'c': 3}

You can use the get method to get one of the keys that exist:

>>> print(my_dict.get('a'))
1

But if you try to get a key that doesn't exist (such as does_not_exist), you will get None by default:

>>> print(my_dict.get("does_not_exist"))
None

As mentioned in the documentation, you can also provide a default value that will be returned when the key doesn't exist:

>>> print(my_dict.get("does_not_exist", "default_value"))
default_value

But this default value won't be used if the key does exist in the dictionary (if the key does exist, you'll get its value):

>>> print(my_dict.get("a", "default_value"))
1

Knowing that, when you build your flattened_record you can do:

'stages.hmStage': stage_record['hmStage'],
'stages.type': stage_record.get('type', ""),  
'stages.isEditable': stage_record['isEditable'],

So if the stage_record dictionary doesn't contain a key type, get('type') will return an empty string.

You can also try with just:

'stages.hmStage': stage_record['hmStage'],
'stages.type': stage_record.get('type'),  
'stages.isEditable': stage_record['isEditable'],

and then stage_record.get('type') will return None when that stage_record doesn't contain a type key.

Or you could make the default "UNKNOWN"

'stages.type': stage_record.get('type', "UNKNOWN"),  
Comments