Surabhi Agarwal Surabhi Agarwal - 4 months ago 183
Java Question

Elastic Search: RemoteTransportException in Paginated search for more than 10000 results

I am using Elastic Search to run paginated search on indices through query set in my java program. Here I have two cases as mentioned below:

Searching with-out using ES Scroll.

Here For eg: Total search result is 10 010, page size is 100. Therefore the search result will be divided to 11 pages having 100 records each. When I view my result on each page till 10th page the records are returned correctly i.e for first 10 000 records. But when I view the 11th page i.e the records from 10 001 to 10 010 I get below error:


RemoteTransportException[[James Jaspers][127.0.0.1:9300][indices:data/read/search[phase/query+fetch]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10010].

Caused by: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10010]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter


Following is code snippet, in this search page value is passed as 100 and the DEFAULT_SEARCH_PAGE_SIZE is 1000

if (searchPage != null) {
builder.setFrom((int) searchPage.getPageStart());
builder.setSize((int) searchPage.getPageSize());
} else {
builder.setFrom(0);
builder.setSize(DEFAULT_SEARCH_PAGE_SIZE);
}

builder.setTypes(getType());
SearchResponse response = builder.execute().actionGet(60000);
SearchHits hits = response.getHits();
if (hits.getTotalHits() > 0) {
for (SearchHit hit : response.getHits()) {
//process my hits and add them to list

}
}
//return the list


As suggested in above error I tried Using Scroll in Elastic search in below code when I do this above error is not generated but the result returned on every page is same i.e on every page first 100 records are been shown.

if (searchPage != null) {
builder.setFrom((int) searchPage.getPageStart());
builder.setSize((int) searchPage.getPageSize());
} else {
builder.setFrom(0);
builder.setSize(DEFAULT_SEARCH_PAGE_SIZE);
}
builder.setTypes(getType()).setScroll(new TimeValue(60000));

SearchResponse response = builder.execute().actionGet(60000);
SearchHits hits = response.getHits();
if (hits.getTotalHits() > 0) {
for (SearchHit hit : response.getHits()) {
//process my hits and add them to list
}
}
//return the result


I know that while using Elastic Search Scroll API ill have the request scrollId to fetch my next result set, and after using that I may get correct result on each page when Ill move sequentially in my search i.e 1, 2, 3, 4..etc . But then I also want to jump directly on some page eg: I am on page 1 and want to move on page 5, then How the Scroll Api will handle this?

Updated Block

As
adityasinghraghav
explained


Although you are requesting only hundred(only 10 exist in your case) results ie. from 10000-10010 under the hood elasticsearch has to get all the 10010 result sort them and then discard the 10000 results


I have read about max_result_window parameter. This parameter defaults to 10,000 which is safe for almost all clusters. Values higher than can consume significant chunks of heap memory per search and per shard executing the search. It’s safest to leave this value as it is, but this setting is dynamic so it can raised or lowered as needed.

If I have a case where my Total result = 500 000 and I have set Max Result Windows Size = 100 000 and Page Size = 1000.

If I want to request for 5th page then:


  • Will Elastic Search list for 100 000 values_i.e max result window value, sort these 100 000 and then discard first first 4 000 and then get next 1 000 result



OR


  • Will it list according to required page in this case it will be 5 000 values, sort these 5 000 discard 4 000 and then get next 1 000 result?


Answer

This happens because the max result window size of elasticsearch is 10000 by default. Now although you are requesting only hundred(only 10 exist in your case) results ie. from 10000-10010 under the hood elasticsearch has to get all the 10010 result sort them and then discard the 10000 results and then give you the 10 left, and hence the problem of exceeding the max window size. The simplest thing you could do to fix this would be increase this default value of 10000 to a very high value. You could use th following command to do that:

curl -XPUT http://1.2.3.4:9200/index/_settings -d '{ "index" : { "max_result_window" : 1000000}}'

Coming to the scroll api, it does not return paginated results hence the concept of from does not exist and the size parameter is used in a different way. The scroll api will ask each of the shards to give it's top "size" results so if the size is 10 and you have 5 primary shards, elasticsearch will return you 50 results. Now every request to the scroll api will generate a scroll id which you will need to pass to the next query to get the next "page" of result. And since you are not doing that you keep getting the same results. You should read more about the implementation of the scroll api here.

But then I also want to jump directly on some page eg: I am on page 1 and want to move on page 5

Also since there is no pagination in scroll api you can't simply jump between non consecutive pages.

Now you have to also keep in mind that for doing the scroll elasticsearch takes a snapshot in time of the index, so if you do any changes to the index during the time you keep the scroll context open, these changes won't be reflected in the results.