lining lining - 20 days ago 8
Java Question

Cassandra Client Java API's

I have recently started working with Cassandra Database. Now I am in the process of evaluating which

Cassandra client
we should go forward with.

I have seen various post on stackoverflow about which client to use for Cassandra but none has very definitive answer.

My team has asked me to do some research on this and come up with certain
pros and cons
for each
Cassandra Client API’s
in Java.

As I mentioned, I recently got involved with
Cassandra
so not have that much idea why certain people choose
Pelops client
and why certain people go with
Astyanax
and some other clients.

I know brief things about each of the Cassandra clients, by which I mean I am able to make that work and start reading and writing to Cassandra database.

Below is the information I have so far.

CASSANDRA APIS


  • Hector (Production-Ready)

    The most stable of the Java APIs, ready for prime-time.

  • Astyanax (The Up and Comer)

    A clean Java API from Netflix. It isn't as widely used as Hector, but it is solid.

  • Kundera (The NoSQL ORM)

    JPA compliant, this is handy when you want to interact with Cassandra via objects.

    This constrains you somewhat in that you won't be able to have a dynamic number of
    columns/names, etc. But it does allow you to port over ORMs, or centralize storage
    onto Cassandra for more traditional uses.

  • Pelops

    I've only used Pelops briefly. It was a straight forward API, but didn't seem to
    have the momentum behind it.

  • PlayORM (ORM without the constraints?)

    I just heard about this. It looks like it is trying to solve the impedance
    mismatch between traditional JPA-based ORMs and NoSQL by introducing JQL. It looks
    promising.

  • Thrift (Avoid Me!)

    This is the "low-level" API.



Below are our priorities in deciding
Cassandra Client
-


  • First priorities are: low latency overhead, Asynch API, and reliability/stability for production environment.

    (e.g. a more user-friendly APIs that can be had in the DAL that wraps the client).

  • Connection pooling and partition awareness are some other good feature to have.

  • Able to detect any new nodes that got added.

  • Good Support as well (as pointed by dean below)



Can anyone provide some thoughts on this? And also any pros and cons for each
Cassandra Client
and also which client can fulfill my requirements will be of great help as well.

I believe, mainly I will be revolving around
Astyanax client or New Datastax client that uses Binary protocol
I guess basis on my research so far. But don't have certain information to back my research and present it to my team.

Any comparison between Astyanax client and New Datastax client(which uses new Binary protocol) will be of great help.

It will be of great help to me in my research and will get lot of knowledge on this from different people who have used different clients in the past.

Answer

Thrift is becoming more of a legacy API:

First, you should be aware that the Thrift API is not going to be getting new features ; it's there for backwards compatibility, and not recommended for new projects.
- the paul

So I'd avoid Thrift based APIs (thrift is only kept for backwards compatibility).

In saying that if you do need to use a thrift based API I'd go for Astyanax. Astyanax is very easy to use (compared to other thrift APIs but my personal experience is that Datastax's driver is even easier).

So you should have a look at Datastax's API (and GitHub repo)? I'm not sure if there any compiled versions of the API for download but you can easily build it with Maven. Also if you take a look at the GitHub repo's commit logs it undergoes very frequent updates.

The driver works exclusively with CQL3 and is asynchronous but be warned that Cassandra 1.2 is the earliest supported version.

Performance
Astyanax is thrift based and Datastax's drive is the binary protocol. Here are the latest benchmarks I could find between thrift and CQL (note these are definitely out of date). But in fairness the small difference in performance shown in these benchmarks will rarely matter.

Asynch support
Datastax's asynch support is a definite advantage over Astyanax (Netflix tried implementing it but decided not to).

Documentation
I cant really argue against Netflix's wiki. The documentation is excellent and its updated fairly frequently. Their wiki includes code examples, and you can find tests in the source code if you need to see the code at work. I struggled to find any documentation of the Datastax driver however test are provided in the GitHub repository so that is a starting point.

Also have a look at this answer (well.. not my one anyway) It looks into some advantages/disadvantages of Thrift and CQL.

Comments