Superbrain_bug Superbrain_bug - 8 months ago 64
Apache Configuration Question

Making storage plugin on Apache Drill to HDFS

I'm trying to make storage plugin for Hadoop (hdfs) and Apache Drill.
Actually I'm confused and I don't know what to set as port for hdfs:// connection, and what to set for location.
This is my plugin:

"type": "file",
"enabled": true,
"connection": "hdfs://localhost:54310",
"workspaces": {
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null
"formats": {
"psv": {
"type": "text",
"extensions": [
"delimiter": "|"
"csv": {
"type": "text",
"extensions": [
"delimiter": ","
"tsv": {
"type": "text",
"extensions": [
"delimiter": "\t"
"parquet": {
"type": "parquet"
"json": {
"type": "json"
"avro": {
"type": "avro"

So, is ti correct to set localhost:54310 because I got that with command:

hdfs -getconf -nnRpcAddresses

or it is :8020 ?

Second question, what do I need to set for location? My hadoop folder is in:


, and there you can find /etc /bin /lib /log ... So, do I need to set location on my datanode, or?

Third question. When I'm connecting to Drill, I'm going through sqlline and than connecting on my zookeeper like:

!connect jdbc:drill:zk=localhost:2181

My question here is, after I make storage plugin and when I connect to Drill with zk, can I query hdfs file?

I'm very sorry if this is a noob question but I haven't find anything useful on internet or at least it haven't helped me.
If you are able to explain me some stuff, I'll be very grateful.


As per Drill docs,

    "type" : "file",
    "enabled" : true,
    "connection" : "hdfs://",
    "workspaces" : {
      "root" : {
        "location" : "/user/root/drill",
        "writable" : true,
        "defaultInputFormat" : null
    "formats" : {
      "json" : {
        "type" : "json"

In "connection",

put namenode server address.

If you are not sure about this address. Check or fs.defaultFS properties in core-site.xml.

Coming to "workspaces",

you can save workspaces in this. In the above example, there is a workspace with name root and location /user/root/drill. This is your HDFS location.

If you have files under /user/root/drill hdfs directory, you can query them using this workspace name.

Example: abc is under this directory.

 select * from dfs.root.`abc.csv`

After successfully creating the plugin, you can start drill and start querying .

You can query any directory irrespective to workspaces.

Say you want to query employee.json in /tmp/data hdfs directory.

Query is :

select * from dfs.`/tmp/data/employee.json`