IgorekPotworek IgorekPotworek - 2 years ago 289
Java Question

How to print best model params in Apache Spark Pipeline?

I'm using pipeline api of Apache Spark Framework for validation of parameters.
I'm building TrainValidationSplitModel like this :

Pipeline pipeline = ...
ParamMap[] paramGrid = ...

TrainValidationSplit trainValidationSplit = new TrainValidationSplit().setEstimator(pipeline).setEvaluator(new MulticlassClassificationEvaluator()).setEstimatorParamMaps(paramGrid).setTrainRatio(0.8);
TrainValidationSplitModel model = trainValidationSplit.fit(training);

My question is: how can i extract and print params of best trained model?

Answer Source

Finally i did it. Spark prints this metrics after training. I had ERROR log level for spark, so I haven't seen this:

2015-10-21 12:57:33,828 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Train validation split metrics: WrappedArray(0.7141940371838821, 0.7358721053749735)

2015-10-21 12:57:33,831 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Best set of parameters:
    hashingTF_79cf758f5ab1-numFeatures: 2000000,
    nb_67d55ce4e1fc-smoothing: 1.0

2015-10-21 12:57:33,831 [INFO  org.apache.spark.ml.tuning.TrainValidationSplit]
Best train validation split metric: 0.7358721053749735.

Now I've added level INFO for class TrainValidationSplit in my log4j.properties file:

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download