Stanley Stanley - 1 month ago 9
Scala Question

Spark Shell Import Fine, But Throws Error When Referencing Classes

I am a beginner in Apache Spark, so please excuse me if this is quite trivial.

Basically, I was running the following import in

spark-shell
:

import org.apache.spark.sql.{DataFrame, Row, SQLContext, DataFrameReader}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql._
import org.apache.hadoop.hive.ql.io.orc.{OrcInputFormat,OrcStruct};
import org.apa‌​che.hadoop.io.NullWritable;
...

val rdd = sc.hadoopFile(path,
classOf[org.apache.hadoop.hive.ql.io.orc.OrcInputFor‌​mat],
classOf[NullWritable],
classOf[OrcStruct],
1)


The import statements up till OrcInputFormat works fine, with the exception that:

error: object apa‌​che is not a member of package org
import org.apa‌​che.hadoop.io.NullWritable;


It does not make sense, if the import statement before goes through without any issue.

In addition, when referencing
OrcInputFormat
, I was told:

error: type OrcInputFor‌​mat is not a member of package org.apache.hadoop.hive.ql.io.orc


It seems strange that import for
OrcInputFormat
to work (I assume it works, since no error is thrown), but then the above error message turns up. Basically, I am trying to read ORC files from S3.

I am also looking at what have I done wrong, and why this happens.

What I have done:


  1. I have tried running
    spark-shell
    with the
    --jars
    option, and tried importing
    hadoop-common-2.6.0.jar
    (My current version of Spark is 1.6.1, compiled with Hadoop 2.6)

  2. val df = sqlContext.read.format("orc").load(PathToS3)
    , as referred by (Read ORC files directly from Spark shell). I have tried variations of S3, S3n, S3a, without any success.


Answer

You have 2 non-printing characters between org.ape and che in the last import, most certainly due to a copy paste :

import org.apa‌​che.hadoop.io.NullWritable;

Just rewrite the last import statement and it will work. Also you don't need these semi-colons.

You have the same problem with OrcInputFormat :

error: type OrcInputFor‌​mat is not member of package org.apache.hadoop.hive.ql.io.orc

That's funny, in the mobile version of Stackoverflow we can cleary see those non-printing characters :

enter image description here