Rahul Tanwani Rahul Tanwani - 11 months ago 67
Scala Question

Scala - can 'this' be null in Scala for a live object?

I am experiencing something that is against my understanding. My understanding has been that 'this' cannot be null for the live object, however, for the case shown below, I am experiencing something of that sort.

Context - I am using the XGBoost4J-Spark package for this case. You can look at the source code here. More specifically, I am referring to the XGBoostEstimator class. I have the following definition of the class, with just one additional print statement.

package ml.dmlc.xgboost4j.scala.spark

import ml.dmlc.xgboost4j.scala.{EvalTrait, ObjectiveTrait}
import org.apache.spark.ml.{Predictor, Estimator}
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.ml.util.Identifiable
import org.apache.spark.mllib.linalg.{VectorUDT, Vector}
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types.{NumericType, DoubleType, StructType}
import org.apache.spark.sql.{DataFrame, TypedColumn, Dataset, Row}

* the estimator wrapping XGBoost to produce a training model
* @param inputCol the name of input column
* @param labelCol the name of label column
* @param xgboostParams the parameters configuring XGBoost
* @param round the number of iterations to train
* @param nWorkers the total number of workers of xgboost
* @param obj the customized objective function, default to be null and using the default in model
* @param eval the customized eval function, default to be null and using the default in model
* @param useExternalMemory whether to use external memory when training
* @param missing the value taken as missing
class XGBoostEstimator(
inputCol: String, labelCol: String,
xgboostParams: Map[String, Any], round: Int, nWorkers: Int,
obj: Option[ObjectiveTrait] = None,
eval: Option[EvalTrait] = None, useExternalMemory: Boolean = false, missing: Float = Float.NaN)
extends Estimator[XGBoostModel] {

println(s"This is ${this}")
override val uid: String = Identifiable.randomUID("XGBoostEstimator")

* produce a XGBoostModel by fitting the given dataset
def fit(trainingSet: Dataset[_]): XGBoostModel = {
val instances = trainingSet.select(
col(inputCol), col(labelCol).cast(DoubleType)).rdd.map {
case Row(feature: Vector, label: Double) =>
LabeledPoint(label, feature)
transformSchema(trainingSet.schema, logging = true)
val trainedModel = XGBoost.trainWithRDD(instances, xgboostParams, round, nWorkers, obj.get,
eval.get, useExternalMemory, missing).setParent(this)

override def copy(extra: ParamMap): Estimator[XGBoostModel] = {

override def transformSchema(schema: StructType): StructType = {
// check input type, for now we only support vectorUDT as the input feature type
val inputType = schema(inputCol).dataType
require(inputType.equals(new VectorUDT), s"the type of input column $inputCol has to VectorUDT")
// check label Type,
val labelType = schema(labelCol).dataType
require(labelType.isInstanceOf[NumericType], s"the type of label column $labelCol has to" +
s" be NumericType")

When I initialize the same code through the Sprak-Shell (or otherwise through the tests), following is the output I get:

scala> import ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator
import ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator

scala> val xgb = new XGBoostEstimator("features", "label", Map.empty,10, 2)
This is null
xgb: ml.dmlc.xgboost4j.scala.spark.XGBoostEstimator = XGBoostEstimator_6cd31d495c8f

scala> xgb.uid
res1: String = XGBoostEstimator_6cd31d495c8f

Any clarification on why and when this behavior is possible would be helpful.

Answer Source

Your toString() implementation comes from Identifiable, which just returns the uid set. And since you set the uid in the next line its not initialized at the time of printing.

Identifiable source:

trait Identifiable {

   * An immutable unique ID for the object and its derivatives.
  val uid: String

  override def toString: String = uid