Andrew Andrew - 2 months ago 13
Python Question

How do I ignore the group in a regex split in python?

I know this is probably really easy question, but i'm struggling to split a string in python. My regex has group separators like this:

myRegex = "(\W+)"


And I want to parse this string into words:

testString = "This is my test string, hopefully I can get the word i need"
testAgain = re.split("(\W+)", testString)


Here's the results:

['This', ' ', 'is', ' ', 'my', ' ', 'test', ' ', 'string', ', ', 'hopefully', ' ', 'I', ' ', 'can', ' ', 'get', ' ', 'the', ' ', 'word', ' ', 'i', ' ', 'need']


Which isn't what I expected. I am expecting the list to contain:

['This','is','my','test']......etc


Now I know it's something to do with the grouping in my regex, and I can fix the issue by removing the brackets. But how can I keep the brackets and get the result above?

Sorry about this question, I have read the official python documentation on regex spliting with groups, but I still don't understand why the empty spaces are in my list

Answer

As described in this answer, How to split but ignore separators in quoted strings, in python?, you can simply slice the array once it's split. It's easy to do so because you want every other member, starting with the first one (so 1,3,5,7)

You can use the [start:end:step] notation as described below:

testString = "This is my test string, hopefully I can get the word i need"
testAgain = re.split("(\W+)", testString)
testAgain = testAgain[0::2]

Also, I must point out that \W matches any non-word characters, including punctuation. If you want to keep your punctuation, you'll need to change your regex.