carmelolg carmelolg - 6 months ago 88
Node.js Question

ArangoDB - Performance issue with AQL query

I'm using ArangoDB for a Web Application through Strongloop.
I've got some performance problem when I run this query:

FOR result IN Collection SORT result.field ASC RETURN result


I added some index to speed up the query like
skiplist
index on the field sorted.

My Collection has inside more than 1M of records.

The application is hosted on n1-highmem-2 on Google Cloud.
Below some specs:


  • 2 CPUs - Xeon E5 2.3Ghz

  • 13 GB of RAM

  • 10GB SSD



Unluckly, my query spend a lot of time to ending.
What can I do?

Best regards,
Carmelo

Answer

Summarizing the discussion above:

If there is a skiplist index present on the field attribute, it could be used for the sort. However, if its created sparse it can't. This can be revalidated by running

db.Collection.getIndexes();

in the ArangoShell. If the index is present and non-sparse, then the query should use the index for sorting and no additional sorting will be required - which can be revalidated using Explain. However, the query will still build a huge result in memory which will take time and consume RAM.

If a large result set is desired, LIMIT can be used to retrieve slices of the results in several chunks, which will cause less stress on the machine.

For example, first iteration:

FOR result IN Collection SORT result.field LIMIT 10000 RETURN result

Then process these first 10,000 documents offline, and note the result value of the last processed document. Now run the query again, but now with an additional FILTER:

FOR result IN Collection
  FILTER result.field > @lastValue LIMIT 10000 RETURN result

until there are no more documents. That should work fine if result.field is unique.

If result.field is not unique and there are no other unique keys in the collection covered by a skiplist, then the described method will be at least an approximation.

Note also that when splitting the query into chunks this won't provide snapshot isolation, but depending on the use case it may be good enough already.