Bobo Bobo - 9 months ago 41
Python Question

Does randomSplit return a copy or a reference to the original rdd?

Suppose I have something like the code below

for idx in xrange(0, 10):
train_test_split = training.randomSplit(weights=[0.75, 0.25])
train_cv = train_test_split[0]
test_cv = train_test_split[1]
# scale train_cv and test_cv

by scaling
, will the original data be affected?

Answer Source

RDDs are immutable.

Therefore, it's actually not possible to 'change' an RDD only transform them. So, no, the original data will not be affected.