danny danny - 2 months ago 28
Apache Configuration Question

JDBC driver cannot be found when reading a DataSet from an SQL database in Apache Flink

After having followed the beginner Java tutorials for Apache Flink on their documentation sites I wanted to try some transformations on my own data. However, I'm having trouble gathering input from my Microsoft SQL database running on a server in the network.

The examples in the section about possible sources for DataSets contain a section that looked like what I need, where a DataSet is built using env.createInput(...) with a JDBCInputFormat. So I added the Maven dependency for Flink JDBC


and remodeled the given code to fit to my own database like this:

// create and configure input format
JDBCInputFormat inputFormat = JDBCInputFormat.buildJDBCInputFormat()

// create and configure type information for DataSet
TupleTypeInfo typeInformation = new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO, INT_TYPE_INFO);

// Read data from a relational database using the JDBC input format
DataSet<Tuple2<String, Integer>> dbData = environment.createInput(inputFormat, typeInformation);

Server address, user name and password are the same that work in another Java program of mine where I use JDBC only. The query is a simple SELECT on two columns, one containing String values, the other Integers.

When running the program I get a ClassNotFoundException referring to the selected driver:
JDBC-Class not found. - org.apache.derby.jdbc.EmbeddedDriver at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.open

Now, I seem to be missing some imports here, but I can't figure out which (and where to get them), as I was expecting Flink JDBC to support this minimal example. The same driver name is also given in the JDBCInputFormat Javadoc. I tried adding JDBC 4.2 manually which did not work.

What do I need to add or change so that the driver will be found? Additionally, is there some official material about Flink JDBC and its usage, apart from the Javadoc? I am even having difficulties finding tutorials about Flink and SQL sources in general.

  1. If you want to read data from a Microsoft SQL Server database, you should use the JDBC driver for SQL Server, not the one for Apache Derby. The JDBC drivers are often included in the DBMS distribution / installation. Maybe Microsoft also offers the corresponding JAR file as a download on a website.

  2. The driver must be added to your classpath. There are two options: 1) bundle it in your application JAR, i.e., add include it in the fat jar or 2) add it to Apache Flink's ./lib folder (note, it must be added to all Flink installations of the cluster.