lines = sc.textFile(fileName)
lines is an
RDD (collection) of
Strings so you need to call something (
substring) on each element. To get the result of a function call on each member of the RDD,
map is your friend.
Python (courtesy of @zero323):
lines.map(lambda line: line[10:21])
lines.map ( line => line.substring(10,20) )
This returns another RDD, so you'll need to write more transformations before your action (ie. return result or write to file), which will trigger it to run.