pierot pierot - 3 months ago 34
Linux Question

Solr server memory and disk space

Context:

I have an AWS EC2 instance


  • 8Gb RAM

  • 8Gb of disk space



It runs Solr 5.1.0 with


  • Java Heap of 2048Mb

  • -Xms2048m -Xmx2048m



Extra: (updated)


  • Logs are generated on the server

  • Imports happen in intervals of 10s (always delta)

  • Importing from DB (
    JdbcDataSource
    )

  • I don't think I have any optimization strategy configured right now

  • GC profiling? I don't know.

  • How can I find out how large the fields are .. and what is large?



Situation:

The index on Solr has 200.000 documents and is queried not more than once per second. However, in about 10 days, the memory and disk space of the server reaches 90% - 95% of the available space.

When investigating the disk usage
sudo du -sh /
it only returns a total of
2.3G
. Not nearly as much as what
df -k
tells me (
Use% -> 92%
).

I can, sort of, resolve the situation by restarting the Solr service.

What am i missing? How come Solr consumes all memory and disk space and how to prevent it?

Extra info for @TMBT

Sorry for the delay, but I’ve been monitoring the Solr production server for the last few days. You can see a roundup here:
https://www.dropbox.com/s/x5diyanwszrpbav/screencapture-app-datadoghq-com-dash-162482-1468997479755.jpg?dl=0
The current state of Solr: https://www.dropbox.com/s/q16dc5t5ctl32od/Screenshot%202016-07-21%2010.29.13.png?dl=0
I restarted Solr at the beginning of the monitoring and now, 2 days later I see the disk space goes down at a rate of 1,5Gb per day.
If you need more specifics, let me know.


  • There are not so many deleted docs per day. We’re talking 50 - 250 per day max.

  • The current logs directory of Solr:
    ls -lh /var/solr/logs
    ->
    total 72M

  • There is no master-slave setup

  • The importer runs ever 10 seconds, but it imports no more than 10 - 20 docs each time. The large import of 3k-4k docs happens each night. There is not much action going on in Solr at that time.

  • There are no large fields, the largest field can contain up to 255 chars.



With the monitoring in place I tested the most common queries. It does contain faceting (field, queries), sorting, grouping, … But I doesn’t really affect the various metrics of heap and gc count.

Answer

I finally managed to solve this problem. So I'm answering my own question.

I changed / added the following lines in the log4j.properties file which is located in /var/solr/ (the Solr root location in my case).

# log4j.rootLogger=INFO, file, CONSOLE
# adding:
log4j.rootLogger=WARN, file, CONSOLE

Lowering the logging level.

# adding:
log4j.appender.file.Threshold=INFO

Set logging treshold.

You can see in the graphs below that as of september 2nd, the disk usage is steady as it should be. The same is true for the memory consumption on the server.

solr-graphs

Comments