mi6crazyheart mi6crazyheart - 10 days ago 3
PHP Question

Elasticsearch bulk upload error with PHP - Limit of total fields [1000] in index has been exceeded

We are planning use ElasticSearch in one of our projects. Currently, we are testing ElasticSearch 5.0.1 with our data. One issue we are facing is when we are doing a bulk upload from our MySQL tables to elasticsearch following error we are getting...

java.lang.IllegalArgumentException: Limit of total fields [1000] in index [shopfront] has been exceeded
at org.elasticsearch.index.mapper.MapperService.checkTotalFieldsLimit(MapperService.java:482) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:343) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:277) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.applyRequest(MetaDataMappingService.java:323) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.metadata.MetaDataMappingService$PutMappingExecutor.execute(MetaDataMappingService.java:241) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.service.ClusterService.runTasksForExecutor(ClusterService.java:555) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.cluster.service.ClusterService$UpdateTask.run(ClusterService.java:896) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:451) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:238) ~[elasticsearch-5.0.1.jar:5.0.1]
at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:201) ~[elasticsearch-5.0.1.jar:5.0.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]


We are using PHP as elasticsearch client to doing the bulk upload from MySQL to Elastic. After doing some googling I got this piece of info - https://discuss.elastic.co/t/es-2-3-5-x-metricbeat-index-field-limit/66821

Somewhere also I read that using of "index.mapping.total_fields.limit" will fix the thing. But, can't able to understand how to using that in my PHP code. Here is my PHP code.

$params = ['body' => []];

$i = 1;
foreach ($productsList as $key => $value) {

$params['body'][] = [
'index' => [
'_index' => 'shopfront',
'_type' => 'products'
],
'settings' => ['index.mapping.total_fields.limit' => 3000]
];

$params['body'][] = [
'product_displayname' => $value['product_displayname'],
'product_price' => $value['product_price'],
'popularity' => $value['popularity'],
'lowestcomp_price' => $value['lowestcomp_price']
];

// Every 1000 documents stop and send the bulk request
if ($i % 1000 == 0) {
$responses = $client->bulk($params);

// erase the old bulk request
$params = ['body' => []];

// unset the bulk response when you are done to save memory
unset($responses);
}

$i++;
}

// Send the last batch if it exists
if (!empty($params['body'])) {
$responses = $client->bulk($params);
}


NOTE - I've used same code with Elasticsearch 2.4.1 & it's working fine with that.

Val Val
Answer

In ES 5, the ES folks decided to limit the number of fields that a mapping type can contain to prevent a mapping explosion. As you've noticed, that limit has been set at 1000 fields per mapping, but you can lift that limit to suit your needs by specifying the index.mapping.total_fields.limit setting either at index creation time or by updating the index settings, like this:

curl -XPUT 'localhost:9200/shopfront/_settings' -d '
{
    "index.mapping.total_fields.limit": 3000
}'

Note that you also need to ask yourself whether having that many fields is a good thing. Do you need them all? Can you combine some? etc, etc