Bobo Bobo - 1 month ago 15
Python Question

Does randomSplit return a copy or a reference to the original rdd?

Suppose I have something like the code below

for idx in xrange(0, 10):
train_test_split = training.randomSplit(weights=[0.75, 0.25])
train_cv = train_test_split[0]
test_cv = train_test_split[1]
# scale train_cv and test_cv


by scaling
train_cv
and
test_cv
, will the original data be affected?

Answer

RDDs are immutable.

Therefore, it's actually not possible to 'change' an RDD only transform them. So, no, the original data will not be affected.