George George - 15 days ago 5
SQL Question

I need some advice about druid and metamarkets

I need a solution for storing logs (which more or less follow one of, say, 10, standard formats), preferably in real time, in a database which is fast to query and can easily give me the result to various wired queries. E.g. queries looking for keywords in text bodies, queries involving multiple tables.

A solution that was recommended to me was MetaMarket, which seems to do real-time logging with a very good query system in style. However I'm unsure about the cost and wether or not such a complex solution is needed.

From what I understand the "selling point" of metamarket is the druid db and said db is open source and can be deployed outside of their stack. So what I come here to ask is:

Have any of you guys had experience deploying a real-time logging system with Druid ? How hard was it ? How long did it take ? What are the challenges ? What other technologies besides druid did you use ? Do you have any recommended reading ?

Have any of you had experience with metamarket. If so, again, how hard was it ? how long did it take ? what are the challenges ? how were the cost once it hit production ? Do you have any recommended reading on the subject ?

Also, bonus question: Are there actually any benchmarks done by "unbiased professionals" about druid ? The fact that a real-time in real-time out databse is written in Java seems a bit... ahm, hard to believe.

Answer

This is quick answer. It is true druid is open source but the missing link here is a good UI that plays nice with druid. There is one UI used to be called caravel and now is superset i guess it can do an ok job. Concerning running a druid cluster it should not be that hard if you have enough resources (eg engineers) to put in place all the pipeline from packaging to deploying druid on the machines/cloud. Finally the last piece is monitoring/updating the cluster it needs good amount of work as well. And yes it is written using JAVA but that's the case for many other realtime software take example of KAFKA, in fact druid does a lot of thing off-heap and uses memory mapped files for serving data. Reading the white paper will provide a good/basic understanding of the system, hence you find the answer if druid is a good fit or not.

Comments