After having followed the beginner Java tutorials for Apache Flink on their documentation sites I wanted to try some transformations on my own data. However, I'm having trouble gathering input from my Microsoft SQL database running on a server in the network.
The examples in the section about possible sources for DataSets contain a section that looked like what I need, where a DataSet is built using env.createInput(...) with a JDBCInputFormat. So I added the Maven dependency for Flink JDBC
// create and configure input format
JDBCInputFormat inputFormat = JDBCInputFormat.buildJDBCInputFormat()
// create and configure type information for DataSet
TupleTypeInfo typeInformation = new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO, INT_TYPE_INFO);
// Read data from a relational database using the JDBC input format
DataSet<Tuple2<String, Integer>> dbData = environment.createInput(inputFormat, typeInformation);
JDBC-Class not found. - org.apache.derby.jdbc.EmbeddedDriver at org.apache.flink.api.java.io.jdbc.JDBCInputFormat.open
If you want to read data from a Microsoft SQL Server database, you should use the JDBC driver for SQL Server, not the one for Apache Derby. The JDBC drivers are often included in the DBMS distribution / installation. Maybe Microsoft also offers the corresponding JAR file as a download on a website.
The driver must be added to your classpath. There are two options: 1) bundle it in your application JAR, i.e., add include it in the fat jar or 2) add it to Apache Flink's
./lib folder (note, it must be added to all Flink installations of the cluster.