I was able to use sqoop to import a mysql table "titles" to hdfs using command like this:
sqoop import --connect jdbc:mysql://localhost/employees --username=root -P --table=titles --target-dir=titles --m=1
sqoop import --connect jdbc:mysql://localhost/employees --username=root -P --table titles --hive-import
Output directory hdfs://localhost:9000/user/root/titles already exists
hive> show tables;
As Amit has pointed out, since you already created the HDFS directory in your first command, Sqoop refuses to overwrite the folder
titles since it already contains data.
In your second command, you are telling Sqoop to import (once again) the whole table (which was already imported in the first command) into Hive. Since you are not specifying the
--target-dir with the HDFS destination, Sqoop will try to create the folder
/user/root/. SInce this folder already exists, an error was raised.
When you tell Hive to show the tables,
titles doesn't appear because the second command (with
hive-import) was not successful, and Hive doesn't know anything about the data. When you add the flag
--hive-import, what Sqoop does under the hood is update the Hive metastore which is a database that has the metadata of the Hive tables, partitions and HDFS location.
You could do the data import using just one Sqoop command instead of using two different ones. If you delete the
titles HDFS folder and you perform something like this:
sqoop import --connect jdbc:mysql://localhost/employees --username=root -P --table=titles --target-dir=titles --hive-import --m=1
This way, you are pulling the data from Mysql, creating the
/user/root/titles HDFS directory and updating the metastore, so that Hive knows where the table (and the data) is.
But what if you wouldn't want to delete the folder with the data that you already imported? In that case, you could create a new Hive table
titles and specify the location of the data using something like this:
CREATE [TEMPORARY] [EXTERNAL] TABLE title [(col_name data_type [COMMENT col_comment], ...)] (...) LOCATION '/user/root/titles'
This way, you wouldn't need to re-import the whole data again, since it's already in HDFS.