IgorekPotworek IgorekPotworek - 5 months ago 60x
Java Question

How to print best model params in Apache Spark Pipeline?

I'm using pipeline api of Apache Spark Framework for validation of parameters.
I'm building TrainValidationSplitModel like this :

Pipeline pipeline = ...
ParamMap[] paramGrid = ...

TrainValidationSplit trainValidationSplit = new TrainValidationSplit().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator()).setEstimatorParamMaps(paramGrid).setTrainRatio(0.8);
TrainValidationSplitModel model = trainValidationSplit.fit(training);

My question is: how can i extract and print params of best trained model?


Finally i did it. Spark prints this metrics after training. I had ERROR log level for spark, so I haven't seen this:

2015-10-21 12:57:33,828 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Train validation split metrics: WrappedArray(0.7141940371838821, 0.7358721053749735)

2015-10-21 12:57:33,831 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Best set of parameters:
    hashingTF_79cf758f5ab1-numFeatures: 2000000,
    nb_67d55ce4e1fc-smoothing: 1.0

2015-10-21 12:57:33,831 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Best train validation split metric: 0.7358721053749735.

Now I've added level INFO for class TrainValidationSplit in my log4j.properties file: