Louis. B Louis. B - 2 months ago 9
Java Question

(Neo4j Unmanaged extension API) Why the speed of query depends on the size of dataset in Neo4j?

I'm trying to build a simple unmanaged extension for a Neo4j server (Community Edition).

I have several versions of the same dataset (a small one with 11k nodes, and a larger one with 85k nodes). The small one is a subset of the large one. My nodes have an "id" property which is not the neo4j's < id > but another property called "id". I pick a node's id in the small dataset and run the following query in each dataset :


  1. Retrieve the node from the id

  2. Get all the node's relationships



I do that several times to get rid of some noise during speed measurement. The code is :

@Path("/test")
public class QueryTest {
private GraphDatabaseService graphdb;

public QueryTest (@Context GraphDatabaseService graphdb) {
this.graphdb = graphdb;
}

@GET
@Produces(MediaType.APPLICATION_JSON)
public Response test(final @QueryParam("any") List<Long> any, final @QueryParam("iter") int iter){
JsonGenerator result = new JsonGenerator();

result.writeStartObject();
result.writeKeyValue("iteration", iter);
result.writeKey("time");
result.writeStartArray();

ListIterator<Long> it = any.listIterator();

long id;
long startTime, stopTime, mean = 0;
Node node;
int i = 0;

try(Transaction tx = graphdb.beginTx()) {
while (it.hasNext()) {
id = it.next();
while (i++ < iter) {
startTime = System.nanoTime();
node = graphdb.findNode(Label.label("Movie"), "id", id);
Iterable<Relationship> t = node.getRelationships();
stopTime = System.nanoTime();
mean += (stopTime - startTime);
}
result.writeLong(mean / iter);
}
tx.success();
}
result.writeEndArray();
result.writeEndObject();
return Response.status(Status.OK).entity(result.getJson()).build();
}
}


Where JsonGenerator is a Json creator class.

When accessing the database with a Get Method, it runs in approximately 0.65 to 0.7ms on the small dataset, and around 10ms on the larger dataset.

It seems weird to me, is it really the case that it takes 10x more time to find a node or its relationships? I'm using this in a larger project on which I do not want the size of the dataset to influence performance (which is why I picked Graph-oriented database). I've read in the documentation about unmanaged extensions :


This is a sharp tool, allowing users to deploy arbitrary JAX-RS
classes to the server so be careful when using this. In particular
it’s easy to consume lots of heap space on the server and degrade
performance. If in doubt, please ask for help via one of the community
channels.


Could it be my problem? Could it be that case that by not clearing anything within the transaction I consume too much heap? Anyone has an idea or maybe just some word about the previous quote, in particular why is it easy to consume too much heap?

Thanks

Answer

If you don't create an index on the label/property combination,then neo4j has to go through every single node and check its id property. If you index it, it can go through the inverse process (knowing the id property, it can find all the corresponding nodes) which makes it way faster, and no longer dependent on database size.

See this.

Comments