CoreyMSchafer CoreyMSchafer - 6 months ago 25
Python Question

Get the max of distinct documents and return all fields

I have some documents similar to this structure:

[
{"file_base": "file_1", "version": 100, "file_name": "file_1_100.txt"},
{"file_base": "file_1", "version": 200, "file_name": "file_1_200.txt"},
{"file_base": "file_1", "version": 300, "file_name": "file_1_300.txt"},
{"file_base": "file_2", "version": 100, "file_name": "file_2_100.txt"},
{"file_base": "file_2", "version": 200, "file_name": "file_2_200.txt"},
{"file_base": "file_2", "version": 300, "file_name": "file_2_300.txt"}
]


I need to create a query that gets the latest version of every file and returns all of the fields. So far, I have this:

pipeline = [
{'$sort': {'version': -1}},
{'$match': {}},
{'$group': {
'_id': '$file_base',
'highest_version': {'$first': '$version'}
}
}
]

results = files_collection.aggregate(pipeline)


This almost returns what I need. It returns:

{'_id': 'file_2', 'highest_version': 300}
{'_id': 'file_1', 'highest_version': 300}


But I need all of the fields. In the real-life version of this problem, there are many more fields for each file. So in the context of this example, I would need something like:

{'_id': 'file_2', 'highest_version': 300, 'file_name': 'file_2_300.txt'}
{'_id': 'file_1', 'highest_version': 300, 'file_name': 'file_1_300.txt'}


This question is similar, but it only returns the grouped_by field and the max field. I need them all. Any help greatly appreciated.

Answer Source

I guess this is what you need :

db.collection.aggregate([
    {'$sort': {'version': -1}},
    {'$group': {
        '_id': '$file_base',
        'highest_version': { '$first' : '$version'},
        'file_name': { '$first' : '$file_name'},
        }
    }
])