chaimp chaimp - 1 month ago 20x
Groovy Question

How to change cast of elasticsearch field in scripted metric from Long to Double?

In my aggregators for an Elasticsearch query, I am trying to use a field that happens to be cast as type long, called

for this example:

"test-metric": {
"scripted_metric": {
"init_script": "_agg['test'] = []",
"map_script": "_agg['test'].add(doc.['my_field'].value));

In some instances, it so happens that 'my_field' actually contains a double value, such as
. When outputting the actual hits, I can see that
is being output.

My expectation would be that
will contain that same value of
. However, it is immediately being cast as type
and all calculations thereafter are wrong.

I have tried modifying the schema and that appears to "fix" this, however in my current setup, it is not a viable solution to modify the schema for every instance where this comes up.

Instead, I am looking for a solution that will give me consistent values in the scripted metrics as what I am getting from the matched hits.

I have tried casting like this:

(org.elasticsearch.index.fielddata.ScriptDocValues$Doubles) doc.get('averageSessionsPerWeek')

But, I get this error:

GroovyScriptExecutionException[GroovyCastException[Cannot cast object '[2]' with class 'org.elasticsearch.index.fielddata.ScriptDocValues$Longs' to class 'org.elasticsearch.index.fielddata.ScriptDocValues$Doubles' due to: groovy.lang.GroovyRuntimeException: Could not find matching constructor for: org.elasticsearch.index.fielddata.ScriptDocValues$Doubles(java.lang.Long)]];

It seems like the cast is applied at the time that the
object is created.

Is this even possible?



When you reference doc.['my_field'].value: like you're doing here, you are accessing the doc values, which are the values stored in the reverse index in Elasticsearch. Because this field is mapped as a long (integer) instead of a floating point number, it is actually an integer value in the Elasticsearch index.

When outputting the actual hits, I can see that 1.5 is being output.

When you examine the hits, you are looking at the Fielddata for that document, which is the json object (stored as a string) that you indexed into Elasticsearch. This still contains the exact json string that you indexed, and does not show any coercion from this initial value into what was indexed in the doc values.

This is easiest to understand when you think of an analyzed string field in Elasticsearch.

{ "sentence": "The quick brown fox jumps over the lazy dog" }

In the Fielddata, this would still be the string "The quick brown fox jumps over the lazy dog", but in the reverse index in Elasticsearch it would be stored as the individual tokens: brown, dog, fox, jump, lazy, over, quick (actual index values would be different depending on analyzer used)

As far as the doc values (the reverse index) is concerned, your original value isn't even there.