fguillen fguillen - 2 months ago 14
JSON Question

MongoDB, grouping inner Hash key value by another key value

I have these 4 elements in my collection:

/* 1 */
{
"demographics": [
{
"key": "country",
"value": "ES"
},
{
"key": "city",
"value": "Sevilla"
},
{
"key": "region",
"value": "Andalucía"
}
]
}

/* 2 */
{
"demographics": [
{
"key": "city",
"value": "Cádiz"
},
{
"key": "country",
"value": "ES"
},
{
"key": "region",
"value": "Andalucía"
}
]
}

/* 3 */
{
"demographics": [
{
"key": "country",
"value": "GB"
},
{
"key": "region",
"value": "Greater London"
},
{
"key": "city",
"value": "London"
}
]
}

/* 4 */
{
"demographics": [
{
"key": "country",
"value": "ES"
},
{
"key": "region",
"value": "Andalucía"
},
{
"key": "city",
"value": "Sevilla"
}
]
}


I would like to group them by:


  • demographic.value
    when
    demographic.key = "country"

  • demographic.value
    when
    demographic.key = "region"

  • demographic.value
    when
    demographic.key = "city"



Having a result like this:

{ "values": ["ES", "Andalucía", "Sevilla"], "count": 2 }
{ "values": ["ES", "Andalucía", "Cádiz"], "count": 1 }
{ "values": ["GB", "Greater London", "London"], "count": 1 }


Attention: beware the order of the
demographics
array elements might be not always the same.

I have tried



db.getCollection('test').aggregate(
[
{ "$unwind": "$demographics" },
{
"$project" :{
"_id": 0,
"demographics.key": 1,
"demographics.value": 1
}
},
{
"$group" : {
"_id": {
"key": "$demographics.key",
"value": "$demographics.value"
},
"count": { "$sum": 1 }
}
},
{
"$group" : {
"_id": "$_id.key",
"values": { "$push": { "value": "$_id.value", "count": "$count" } }
}
}
]
)


This gives me this result:

/* 1 */
{
"_id": "country",
"values": [
{
"value": "GB",
"count": 1.0
},
{
"value": "ES",
"count": 3.0
}
]
}

/* 2 */
{
"_id": "region",
"values": [
{
"value": "Greater London",
"count": 1.0
},
{
"value": "Andalucía",
"count": 3.0
}
]
}

/* 3 */
{
"_id": "city",
"values": [
{
"value": "London",
"count": 1.0
},
{
"value": "Cádiz",
"count": 1.0
},
{
"value": "Sevilla",
"count": 2.0
}
]
}


But this is not the groups I am looking for

Answer

You can try running the following pipeline:

db.test.aggregate([
    { "$unwind": "$demographics" },
    { "$sort": { "demographics.key": 1, "demographics.value": 1 } },
    {
        "$group": {
            "_id": "$_id",
            "values": { "$push": "$demographics.value" }
        }
    },
    {  
        "$group": {
            "_id": "$values",
            "count": { "$sum": 1 }
        }
    },
    {
        "$project": {
            "_id": 0, "values": "$_id", "count": 1
        }
    }
])

Sample Output

/* 1 */
{
    "count" : 2,
    "values" : [ 
        "Sevilla", 
        "ES", 
        "Andalucía"
    ]
}

/* 2 */
{
    "count" : 1,
    "values" : [ 
        "London", 
        "GB", 
        "Greater London"
    ]
}

/* 3 */
{
    "count" : 1,
    "values" : [ 
        "Cádiz", 
        "ES", 
        "Andalucía"
    ]
}