Bobo Bobo - 1 year ago 109
Python Question

Does randomSplit return a copy or a reference to the original rdd?

Suppose I have something like the code below

for idx in xrange(0, 10):
train_test_split = training.randomSplit(weights=[0.75, 0.25])
train_cv = train_test_split[0]
test_cv = train_test_split[1]
# scale train_cv and test_cv

by scaling
, will the original data be affected?

Answer Source

RDDs are immutable.

Therefore, it's actually not possible to 'change' an RDD only transform them. So, no, the original data will not be affected.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download