Grimbo Grimbo - 14 days ago 9
Java Question

Speed up mongo queries by parallel them and use a ThreadPool?

Our mongodb architecture stores data weekly. Every week has it's own db with the same collection set. Sometimes I have to check data up to over 12 weeks that means I run the same query over 12 different databases (all on one mongo server):

...
for (MongoOperationDto week : allWeeks) {
results.addAll(repo.find(gid, week.db(), week.collection());
}
...


In this case I run sequentially 12 time find(). I guess the internal connection pool handle them or? If not would it be a benefit if I create 12 Java threads and every thread would run one find? Maybe like:

public class FindTask {

@Autowired
MyMongoRepo repo;

@Async
public List<Result> doFindTask(long gid, MongoOperationDto week) {
return repo.find(gid, week.db(), week.connection());
}
}


Which approach is actually faster or is there no speed difference in retrieving the data?

Answer

The connection pool handle the connections, nothing more:

In software engineering, a connection pool is a cache of database connections maintained so that the connections can be reused when future requests to the database are required

For your first code It means that after the first find has been finished instead of establishing a new connection to MongoDb it can reuse an existing already opened and not used connection present in the pool.

So in the first case you will have 12 serial queries and 1 connection used for each query.

In the second case you have 12 parallel queries using at the same time 12 different connections.

In terms of performances if the queries need long time the second solution should be faster (time to complete), but it use more resources (ram, cpu time). Note that the time is also influenced by your mongodb architecture. If your queries operates with long disk operations on the same disk probably parallelizing them don't improve too much the total time.