Tam Borine Tam Borine - 17 days ago 6
Ruby Question

Strange behaviour corrupting some part of a string but only particular chars, Atom, RSpec

I am struggling with an odd situation. I have a function which iterates over an array of strings and splits each string on "is" which am testing in RSpec like so:

test body



info_combo = ["pish pish Iron is 3910 Credits","glob prok Gold is 57800 Credits"]
expect(interpreter.solveForUnknownInfo(info_combo)).to eq some_final_expectable_object


function



def getSubjectsAndObjects(info_combo)
subjects = []
objects = []
info_combo.each do |info_str|
print info_str
subjectsAndObjects = info_str.split("is")
print subjectsAndObjects
subjects << subjectsAndObjects[0]
objects << subjectsAndObjects[1]
end
return subjects, objects
end


printed output while debugging



"pish pish Iron is 3910 Credits" => first iteration input
["p", "h p", "h Iron ", " 3910 Credits"] => crazy unexpected
"glob prok Gold is 57800 Credits" => second iteration input
["glob prok Gold ", " 57800 Credits"] => expectable output


## after replacing the first substr of 2nd input string, 'pish' with 'another_random_word' ...

"another_random_word pish Iron is 3910 Credits" => first iteration input
["another_random_word p", "h Iron ", " 3910 Credits"] =>some hopeful change
"glob prok Gold is 57800 Credits" => second iteration input
["glob prok Gold ", " 57800 Credits"] => expectable output


## after replacing the final 'pish' with 'another_random_word'

"another_random_word another_random_word Iron is 3910 Credits" => first iteration input
"another_random_word another_random_word Iron ", " 3910 Credits"] => now totally expectable/desired output from function
"glob prok Gold is 57800 Credits" => second iteration input
["glob prok Gold ", " 57800 Credits"] => expectable output


This is really confusing for me. I have no idea how to debug this or ideas of what might be going wrong. I thought it was a text editor glitch (Atom), have restarted the program and no changes.

Something I've missed? Any ideas? Also ideas on improving the question/title are very welcome.

Answer

You've missed something fairly straightforward: the middle two characters of "pish" are "is". So of course, if you split on "is", that gets split into "p" and "h".

There are a couple of ways around this. The simplest, in your case, is probably to split on " is " (that is, "is" with a space on each side). Depending on exact needs, you might instead split on regular expressions such as /\sis\s/ ("is" with some sort of whitespace on either side, could be space, tab, etc) or /\bis\b/ ("is" with a word boundary on either side - in this case, the "is" can't be in the middle of the word, but the surrounding whitespace isn't actually part of the match, so it's not removed from the string).

 "his is hers".split(/\sis\s/) # => ["his", "hers"]
 "his is hers".split(/\bis\b/) # => ["his ", " hers"]

Note that in the first case, the spaces are part of the delimiter and are removed along with it, but in the second case, they are not part of the delimiter, and are not removed.

Comments