nishant kumar nishant kumar - 2 months ago 28
Python Question

Get all unique keys and values

I have several collection in mongodb. The example of structure of data present in mongodb is as follows:

u'_id': ObjectId('581453c6aeddbf0f04fa017b'),
u'pdpData': {u'taxEntry': {u'taxPercentage': 5}, u'fashionType': u'Core'}
u'_id': ObjectId('581453c7aeddbf0f04fa017c'),
u'pdpData': {u'taxEntry': {u'taxPercentage': 5}, u'fashionType': u'Fashion'},
u'catalogAddDate': 1467297611

I want the union of all the key and value in a csv format.
example of the result is as follows:

objectID, pdpdata.taxEntry.taxPercentage, pdpdat.taxEntry.fashionType, pdpdata.catalogAddDate
581453c6aeddbf0f04fa017b, 5, core, NA
581453c7aeddbf0f04fa017c, 5, Fashion, 1467297611

I have tried several method but unfortunately I am unable to get the column names in the required format

mapper = Code(""" function() {for (var key in this) { emit(key,null);}}""")
reducer = Code("""function(key, stuff) { return null; }""")

distinctThingFields = db.women.map_reduce(mapper, reducer, out ={'inline' : 1}, full_response = True)
print distinctThingFields

here I am only getting the column values as

objectID , pdpdata

not the inner key


several collection in mongodb. The example of structure of data present in mongodb is as follows

Assuming that you mean several documents rather than several collections, you could utilise MongoDB Aggregation Pipeline.

Using PyMongo, based on your data examples, you could group by objectId, taxPercentage and fashionType as below:

pipeline = [
cursor = db.collection.aggregate(pipeline)

Iterating the cursor should return you :

{u'_id': {u'taxPercentage': 5.0, u'objectId': ObjectId('...'), u'fashionType': u'Fashion'}}
{u'_id': {u'taxPercentage': 5.0, u'objectId': ObjectId('...'), u'fashionType': u'Core'}}

You can then utilise Python csv module to export to CSV.

If you have these documents spanning across multiple collections, and:

a) The documents have the same data structure: Generally you should have same structured documents in the same collection. See also Data Modeling for more info.

b) The documents have different data structure: You can run aggregation per collections, then aggregating the result in your Python script (client side). If this is a frequently used query/reporting, you should reconsider your data structure.