Normal Normal - 1 year ago 165
reST (reStructuredText) Question

REST API for processing data stored in hbase

I have a lot of records in hbase store (millions) like this

key = user_id:service_id:usage_timestamp value = some_int

That means an user used some service_id for some_int at usage_timestamp.

And now I wanted to provide some rest api for aggregating that data. For example "find sum of all values for requested user" or "find max of them" and so on. So I'm looking for the best practise. Simple java application doesn't met my performance expectations.

My current approach - aggregates data via apache spark application, looks good enough but there are some issues to use it with java rest api so far as spark doesn't support request-response model (also I have took a view into spark-job-server, seems raw and unstable)


Any ideas?

Answer Source

I would offer Hbase + Solr if you are using Cloudera (i.e Cloudera search)

Solrj api for aggregating data(instead of spark), to interact with rest services

Solr Solution (in cloudera its Cloudera search) :

  1. Create a collection (similar to hbase table) in solr.
  2. Indexing : Use NRT lily indexer or custom mapreduce solr document creator to load data as solr documents.

    If you don't like NRT lily indexer you can use spark or mapreduce job with Solrj to do the indexing For ex: Spark Solr : Tools for reading data from Solr as a Spark RDD and indexing objects from Spark into Solr using SolrJ.

  3. Data Retrieval : Use Solrj to get the solr docs from your web service call. In Solrj,

    • There is FieldStatInfo through which Sum,Max etc.... can be achieved

    • There are Facets and Facetpivots to group data

    • Pagination is supported for rest API calls

    you can integrate solr results with Jersey or some other web service as we have already implemented this way.

    /**This method returns the records for the specified rows from Solr Server which you can integrate with any rest api like jersey etc...
        public SolrDocumentList getData(int start, int pageSize, SolrQuery query) throws SolrServerException {
            query.setStart(start); // start of your page
            query.setRows(pageSize);// number of rows per page
  , true));
            final QueryResponse queryResponse = solrCore.query(query, METHOD.POST); // post is important if you are querying huge result set Note : Get will fail for huge results
            final SolrDocumentList solrDocumentList = queryResponse.getResults();
            if (isResultEmpty(solrDocumentList)) { // check if list is empty
      "hmm.. No records found for this query");
            return solrDocumentList;


Also look at

  1. my answer in "Create indexes in solr on top of HBase"


Note : I think same can be achieved with elastic search as well. But out of my experience , Im confident with Solr + solrj

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download