I am experiencing a very strange behaviour from
val joined = countPerChannel ++ countPerSource //two arrays of Doubles joined
(label, orderNo, pageNo, Vectors.dense(joinedCounts))
val parsedData = rawData.filter(row => row != header).map(parseLine)
val data = sqlContext.createDataFrame(parsedData).toDF("label", "orderNo", "pageNo","joinedCounts")
val assembler = new VectorAssembler()
.setInputCols(Array("orderNo", "pageNo", "joinedCounts"))
val assemblerData = assembler.transform(data)
There is nothing strange about the output. Your vector seems to have lots of zero elements thus Spark used a sparse representation of your Vector
To explain further :
It seems like your vector is composed of 18 elements (dimension)
This indices [0,1,6,9,14,17] from the vector contains non zero elements which are in order [17.0,15.0,3.0,1.0,4.0,2.0]
Sparse Vector representation is a way to save computational space thus easier and faster to compute. More on Sparse representation here.
Now of course you can convert that sparse representation to a dense representation but it comes at a cost.