strife25 strife25 - 3 months ago 39
JSON Question

How to query an elasticsearch aggregation with a term and sum on different nested objects?

I have the following object whose

value
attribute is a nested object type:

{
"metadata": {
"tenant": "home",
"timestamp": "2016-03-24T23:59:38Z"
},
"value": {
{ "key": "foo", "int_value": 100 },
{ "key": "bar", "str_value": "taco" }
}
}


This type of object has the following mapping:

{
"my_index": {
"mappings": {
"my_doctype": {
"properties": {
"metadata": {
"properties": {
"tenant": {
"type": "string",
"index": "not_analyzed"
},
"timestamp": {
"type": "date",
"format": "dateOptionalTime"
}
}
},
"value": {
"type": "nested",
"properties": {
"str_value": {
"type": "string",
"index": "not_analyzed"
},
"int_value": {
"type": "long"
},
"key": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}


With this setup, I would like to perform an aggregation that performs the following result:


  • Perform a
    term
    aggregation on the
    str_value
    attribute of objects where the
    key
    is set to
    "bar"

  • In each bucket created from the above aggregation, calculate the
    sum
    of the
    int_value
    attributes where the
    key
    is set to
    "foo"

  • Have the results laid out in a
    date_histogram
    for a given time range.



With this goal in mind, I have been able to get the
term
and
date_histogram
aggregations to work on my nested objects, but have not had luck performing the second level of calculation. Here is the current query I am attempting to get working:

{
"query": {
"match_all": {}
},
"aggs": {
"filters": {
"filter": {
"bool": {
"must": [
{
"term": {
"metadata.org": "gw"
}
},
{
"range": {
"metadata.timestamp": {
"gte": "2016-03-24T00:00:00.000Z",
"lte": "2016-03-24T23:59:59.999Z"
}
}
}
]
}
},
"aggs": {
"intervals": {
"date_histogram": {
"field": "metadata.timestamp",
"interval": "1d",
"min_doc_count": 0,
"extended_bounds": {
"min": "2016-03-24T00:00:00Z",
"max": "2016-03-24T23:59:59Z"
},
"format": "yyyy-MM-dd'T'HH:mm:ss'Z'"
},
"aggs": {
"nested_type": {
"nested": {
"path": "value"
},
"aggs": {
"key_filter": {
"filter": {
"term": {
"value.key": "bar"
}
},
"aggs": {
"groupBy": {
"terms": {
"field": "value.str_value"
},
"aggs": {
"other_nested": {
"reverse_nested": {
"path": "value"
},
"aggs": {
"key_filter": {
"filter": {
"term": {
"value.key": "foo"
}
},
"aggs": {
"amount_sum": {
"sum": {
"field": "value.int_value"
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}


The result I am expecting to receive from Elasticsearch would look like the following:

{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 7,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"filters": {
"doc_count": 2,
"intervals": {
"buckets": [
{
"key_as_string": "2016-03-24T00:00:00Z",
"key": 1458777600000,
"doc_count": 2,
"nested_type": {
"doc_count": 5,
"key_filter": {
"doc_count": 2,
"groupBy": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "taco",
"doc_count": 1,
"other_nested": {
"doc_count": 1,
"key_filter": {
"doc_count": 1,
"amount_sum": {
"value": 100.0
}
}
}
}
]
}
}
}
}
]
}
}
}
}


However, the innermost object (
...groupBy.buckets.key_filter.amount_sum
) is having its
value
return
0.0
instead of
100.0
.

I think this is due to the fact that nested objects are indexed as separate documents, so filtering by one
key
attribute's value is not allowing me to query to against another
key
.

Would anyone have any idea on how to get this type of query to work?

For a bit more context, the reason for this document structure is because I do not control the content of the JSON documents that get indexed, so different
tenant
s may have conflicting key names with different values (e.g.
{"tenant": "abc", "value": {"foo": "a"} }
vs.
{"tenant": "xyz", "value": {"foo": 1} }
. The method I am trying to use is the one laid out by this Elasticsearch Blog Post, where it recommends to transform objects that you don't control into a structure that you do and to use nested objects to help with this (specifically the
Nested fields for each data type
section of the article). I would also be open to learn of a better way to handle this situation of not controlling the document's JSON structure if there is one so that I can perform aggregations.

Thank you!

EDIT: I am using Elasticsearch 1.5.

Answer

Solved this situation by utilizing the reverse_nested aggregation in the correct way as described here: http://www.shayne.me/blog/2015/2015-05-18-elasticsearch-nested-docs/

Comments