rocket_raccoon - 1 month ago 15
Scala Question

# Concatenate Sparse Vectors in Spark?

Say you have two Sparse Vectors. As an example:

``````val vec1 = Vectors.sparse(2, List(0), List(1)) // [1, 0]
val vec2 = Vectors.sparse(2, List(1), List(1)) // [0, 1]
``````

I want to concatenate these two vectors so that the result is equivalent to:

``````val vec3 = Vectors.sparse(4, List(0, 2), List(1, 1)) // [1, 0, 0, 1]
``````

Does Spark have any such convenience method to do this?

I think you have a slight problem understanding `SparseVectors`. Therefore I will make a little explanation about them, the first argument is the number of features | columns | dimensions of the data, besides every entry of the `List` in the second argument represent the position of the feature, and the values in the the third `List` represent the value for that column, therefore `SparseVectors` are locality sensitive, and from my point of view your approach is incorrect.

If you pay more attention you are summing or combining two vectors that have the same dimensions, hence the real result would be different, the first argument tells us that the vector has only 2 dimensions, so `[1,0] + [0,1] => [1,1]` and the correct representation would be `Vectors.sparse(2, [0,1], [1,1])`, not four dimensions.

In the other hand if each vector has two different dimensions and you are trying to combine them and represent them in a higher dimensional space, let's say four then your operation might be valid, however this functionality isn't provided by the SparseVector class, and you would have to program a function to do that, something like (a bit imperative but I accept suggestions):

``````def combine(v1:SparseVector, v2:SparseVector):SparseVector = {
val size = v1.size + v2.size
val maxIndex = v1.size
val indices = v1.indices ++ v2.indices.map(e => e + maxIndex)
val values = v1.values ++ v2.values
new SparseVector(size, indices, values)
}
``````
Source (Stackoverflow)