Rich_F Rich_F - 5 days ago 5
Ruby Question

Ruby String Split on "\t" loses "\n"

\tTrying to split this Tab delimited data set:

171 1000 21
269 1000 25
389 1000 40
1020 1-03 30 1
1058 1-03 30 1
1074 1-03 30 1
200 300 500


(for clarity: )

171\t1000\t21\t\n
269\t1000\t25\t\n
389\t1000\t40\t\n
1020\t1-03\t30\t1\n
1058\t1-03\t30\t1\n
1074\t1-03\t30\t1\n
200\t300\t\t500\n

a = text.split(/\n/)
a.each do |i|
u = i.split(/\t/)
puts u.size
end

==>
3
3
3
4
4
4
4


The \t\n combination seems to shave off the last \t, which I need for further importation. How can I get around this? Cheers

Edited: This is what I was expecting:

4
4
4
4
4
4
4

Answer

If this is for production, you should be using the CSV class as @DmitryZ pointed out in the comments. CSV processing has a surprising number of caveats and you should not do it by hand.

But let's go through it as an exercise...


The problem is split does not keep the delimiter, and it does not keep trailing null columns. You've hit both issues.

When you run a = text.split(/\n/) then the elements of a do not have newlines.

a = [
    171\t1000\t21\t   
    269\t1000\t25\t   
    389\t1000\t40\t
    1020\t1-03\t30\t1
    1058\t1-03\t30\t1
    1074\t1-03\t30\t1
    200\t300\t\t500
]

Then, as documented in String#split, "if the limit parameter is omitted, trailing null fields are suppressed.", so u = i.split(/\t/) will ignore that last field unless you give it a limit.

If you know it's always going to be 4 fields, you can use 4.

u = i.split(/\t/, 4)

But it's probably more flexible to use -1 because "If [the limit is] negative, there is no limit to the number of fields returned, and trailing null fields are not suppressed." so that will keep the empty fields without hard coding the number of columns in the CSV.

u = i.split(/\t/, -1)