Marquis Blount Marquis Blount - 2 months ago 47
Java Question

N1QL Query times out when Using parameterized IN clause

Using Couchbase server 4.1.0 (and 4.5), Java SDK 2.2.8 (also tried with 2.2.7, 2.3.1, & 2.3.3), I have a query leveraging a secondary index which runs fine when I run my code locally and even via CBQ (CBQ takes about 3ms) on the AWS server. However, when running my app on AWS, I get a

TimeOutException
and it's only one query which is timing out, others are not. See details below.

May be worth noting my Couchbase setup has 3 buckets.

Example Doc:

"bucketName": {
"userName": "User_A",
"MessageContent": "This is a message",
"docType": "msg",
"ParentMsgId": "1234",
"MsgType": "test",
"expireTimestamp": 1454975772613,
"publishTimestamp": 1455322362028,
"id": "145826845",
"urls": [],
"subject": "this is a subject",
"type": 1,
"GroupId": "Group_1"
}


Secondary Index:

CREATE INDEX `indexName` ON `bucketName`(`ParentMsgId`,`docType`,`publishTimestamp`) USING GSI


Example Query extracted from
N1qlQuery#n1ql()


{"statement":
"select count(*) as msgCount from bucketName
where ParentMsgId is not missing and docType = 'msg'
and ParentMsgId IN $parentId
and publishTimestamp between $startTime and $endTime
","$endTime":1470726861816,
"$startTime":1470640461816,
"$parenIds":["fa11845b-9ea5-4778-95fe-e7206843c69b"]
}


Java Code

public static final String COUNT_STATEMENT = "select count(*) as count " +
"from bucketName " +
"where ParentMsgId is not missing " +
"and docType = 'msg' " +
"and ParentMsgId IN $parentIds " +
"and publishTimestamp between $startTime and $endTime";

public int getCountForDuration(Long startTime, Long endTime, Collection<String> parentIds){
List<String> idList = new ArrayList<>(parentIds);
JsonObject placeHolders = JsonObject.create()
.put("parentIds", JsonArray.from(idList))
.put("startTime", startTime)
.put("endTime", endTime);
N1qlQuery query = N1qlQuery.parameterized(COUNT_STATEMENT, placeHolders)
N1qlQueryResult result = bucket.query(query);
...
}


Query Explain result

cbq> explain select count(*) as msgCount from bucketName where ParentMsgId is not missing and docType = 'msg' and ParentMsgId IN ["01b88f7f-4de6-4daa-9562-a2c902e818ad"] and publishTimestamp between 1466445409000 and 1466531809000;
{
"requestID": "61afcf02-3b3d-4c8a-aec6-b76c4c1f7b17",
"signature": "json",
"results": [
{
"#operator": "Sequence",
"~children": [
{
"#operator": "IndexScan",
"index": "indexName",
"keyspace": "bucketName",
"namespace": "default",
"spans": [
{
"Range": {
"High": [
"successor(\"01b88f7f-4de6-4daa-9562-a2c902e818ad\")"
],
"Inclusion": 1,
"Low": [
"\"01b88f7f-4de6-4daa-9562-a2c902e818ad\""
]
}
}
],
"using": "gsi"
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "Fetch",
"keyspace": "bucketName",
"namespace": "default"
},
{
"#operator": "Filter",
"condition": "(((((`bucketName`.`ParentMsgId`) is not missing) and ((`bucketName`.`docType`) = \"msg\")) and ((`bucketName`.`ParentMsgId`) in [\"01b88f7f-4de6-4daa-9562-a2c902e818ad\"])) and ((`bucketName`.`publishTimestamp`) between 1466445409000 and 1466531809000))"
},
{
"#operator": "InitialGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
}
]
}
},
{
"#operator": "IntermediateGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
},
{
"#operator": "FinalGroup",
"aggregates": [
"count(*)"
],
"group_keys": []
},
{
"#operator": "Parallel",
"~child": {
"#operator": "Sequence",
"~children": [
{
"#operator": "InitialProject",
"result_terms": [
{
"as": "msgCount",
"expr": "count(*)"
}
]
},
{
"#operator": "FinalProject"
}
]
}
}
]
}
],
"status": "success",
"metrics": {
"elapsedTime": "2.748194ms",
"executionTime": "2.660232ms",
"resultCount": 1,
"resultSize": 3274
}
}


Logs

java.lang.Thread.run(Thread.java:745)
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:533)
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:598)
org.eclipse.jetty.server.nio.BlockingChannelConnector$BlockingChannelEndPoint.run(BlockingChannelConnector.java:293)
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:50)
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:218)
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:582)
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:919)
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:441)
org.eclipse.jetty.server.Server.handle(Server.java:349)
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:875)
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:409)
org.ops4j.pax.web.service.jetty.internal.HttpServiceContext.doHandle(HttpServiceContext.java:117)
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:941)
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:483)
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
org.ops4j.pax.web.service.jetty.internal.HttpServiceServletHandler.doHandle(HttpServiceServletHandler.java:70)
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:480)
org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:547)
org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:201)
javax.servlet.http.HttpServlet.service(HttpServlet.java:693)
org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:150)
org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:225)
org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:130)
org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:194)
org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:214)
org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:237)
org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)
org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:262)
org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:94)
org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:58)
org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:89)
org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:168)
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)
org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:180)
java.lang.reflect.Method.invoke(Method.java:498)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
biz.te2.core.services.beacon.impl.BeaconResource.getVenuesBeaconData(BeaconResource.java:105)
xxx.xxx.xxx.getBeaconHealthForRangeAndVenue(BeaconHealthServiceImpl.java:40)
xxx.xxx.xxx..getAllMessagesCount(BeaconHealthServiceImpl.java:80)
com.sun.proxy.$Proxy146.getMessageCountForDuration(Unknown Source)
org.apache.aries.proxy.impl.ProxyHandler.invoke(ProxyHandler.java:78)
org.apache.aries.proxy.impl.DefaultWrapper.invoke(DefaultWrapper.java:31)
org.apache.aries.proxy.impl.ProxyHandler$1.invoke(ProxyHandler.java:50)
java.lang.reflect.Method.invoke(Method.java:498)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
xxx.xxx.xxx.getMessageCountForDuration(MessageCouchbaseRepo.java:364)
xxx.xxx.xxx.getN1qlQueryRows(MessageCouchbaseRepo.java:372)
com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:582)
com.couchbase.client.java.CouchbaseBucket.query(CouchbaseBucket.java:656)
com.couchbase.client.java.util.Blocking.blockForSingle(Blocking.java:74)
java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)





This is an example of a query which works fine, as requested.

public static final String EXPERIENCE_ID_STATEMENT = "Select id " +
"from read as exp " +
"where id is not missing " +
"and docType = 'experience' " +
"and venueId = $venueId " +
"and exp.rule.poiIds is not missing " +
"and any poi in exp.rule.poiIds satisfies poi = $poiId end";


The only thing unique about this query versus the others, is it uses an
IN
clause and receives the fields via a parameterized
JsonArray
.

There are no network delays. I don't think this is an issue, as other queries are working and they are essentially chained called one after another (I also tested running this query alone and it still performs extremely slow).

App and CB are both on AWS. I have tested with both on same AWS server and on different servers as well in both cases issue is there. I have a client on AWS and not on AWS, both have the issue. By a client I mean a mechanism that invokes my app. It still gets a timeout when the query is called.

My collectinfo couchbase logs are here s3.amazonaws.com/cb-customers/TE2/

Answer

I found the issue it has to do with the parameterization of the values in the IN clause. When I removed the parameterization from the query I was able to run as fast as CBQ. My only option was to drop parameterizing.I Attempted to adjust the secondary index a bit moving the ParentMsgId to the end of the fields list this did not help in my case.

The underlying problem diagnosed from couchbase rep.

the underlying issue is that once you put the IN values in a parameter, the optimiser can't assume the IN clause only has one value. Since experienceId leads the index, we can only use an equality to travel down the index and start counting the keys that apply because we would have to skip and scan. I would suggest creating the index as (docType, publishTimestamp, ParentMsgId), that way we can still scan the index on the conditions set on docType, publishTimestamp, and apply the filter on experienceId later on in the pipeline.