I am not very good with regex and it continues to confuse me every time it comes up so instead of writing a possibly incorrect regex string, I want to split a string a different way.
Let's say I have a string "hello, my name is Joseph! Haha, hello!" and I want to split it whenever I encounter a non-alphanumeric character. So then, in this case, I would obtain:
Is there a way to do this without a regex string? As in: split whenever character != alphanumeric?
(Yes, I do realize it is probably not a smart thing to do to not correct my regex deficiency!)
Personally, I think it is appropriate to use simple and straightforward regexes for such simple tasks.
Compare an itertools and re solutions:
import itertools, re s = "hello, my name is Joseph! Haha, hello!" print(["".join(x) for _, x in itertools.groupby(s, key=str.isalnum)][0::2]) print(re.findall(r"\w+", s))
See an online Python demo here.
As for me, I'd vote for the regex here. The
\w+ matches one or more word characters (letters, digits, underscores) and the
re.findall returns all the non-overlapping occurrences.
groupby groups the substring chunks according to the
key which is set to alphanumeric (
str.alnum) and all the even tokens (the non-word chunks in this concrete case) are removed from the final result with
[0::2]. If a string starts with a non-word char, this won't work, a regex solution is safer and easier.