I'm trying to write a non-trivial Hive job using the Hive Thrift and JDBC interfaces, and I'm having trouble setting up a decent JUnit test. By non-trivial, I mean that the job results in at least one MapReduce stage, as opposed to only dealing with the metastore.
The test should fire up a Hive server, load some data into a table, run some non-trivial query on that table, and check the results.
I've wired up a Spring context according to the Spring reference. However, the job fails on the MapReduce phase, complaining that no Hadoop binary exists:
java.io.IOException: Cannot run program "/usr/bin/hadoop" (in directory "/Users/yoni/opower/workspace/intellij_project_root"): error=2, No such file or directory
Ideally one would be able to test hive queries with
LocalJobRunner rather than resorting to mini-cluster testing. However, due to HIVE-3816 running hive with
mapred.job.tracker=local results in a call to the hive CLI executable installed on the system (as described in your question).
Until HIVE-3816 is resolved, mini-cluster testing is the only option. Below is a minimal mini-cluster setup for hive tests that I have tested against CDH 4.4.
Configuration conf = new Configuration(); /* Build MiniDFSCluster */ MiniDFSCluster miniDFS = new MiniDFSCluster.Builder(conf).build(); /* Build MiniMR Cluster */ System.setProperty("hadoop.log.dir", "/path/to/hadoop/log/dir"); // MAPREDUCE-2785 int numTaskTrackers = 1; int numTaskTrackerDirectories = 1; String racks = null; String hosts = null; miniMR = new MiniMRCluster(numTaskTrackers, miniDFS.getFileSystem().getUri().toString(), numTaskTrackerDirectories, racks, hosts, new JobConf(conf)); /* Set JobTracker URI */ System.setProperty("mapred.job.tracker", miniMR.createJobConf(new JobConf(conf)).get("mapred.job.tracker"));
There is no need to run a separate hiveserver or hiveserver2 process for testing. You can test with an embedded hiveserver2 process by setting your jdbc connection URL to