I am a Java developer, and new to Python. I would like to define a regex accepting all the alphabetic characters except for some of them. I want to exclude just the vowels and the character 'y', be it in upper- or lowercase.
The regex in Java for it would be as follows:
I don't think the current python regular expression module has exactly what you're looking for. The eventual replacement
regex does have what you need, and you can install it should you wish.
Other than that, a negation might be the way to go. Basically, define all the characters you don't want and then invert that. Sounds labourious, but the "not-word" shorthand (
\W) can help us out.
a-zA-Z0-9_ (for ASCII matches), and
\W means the opposite (
[aeiouyAEIOUY\W\d_] means every character which you are not looking for, and so
[^aeiouyAEIOUY\W\d_] means every character you are looking for. eg.
>>> import re >>> s = "xyz_ d10 word" >>> pattern = "[^aeiouyAEIOUY\W\d_]+" >>> re.findall(pattern, s) ['x', 'z', 'd', 'w', 'rd']
If you are strictly after only ASCII characters then you can use the
ASCII flag. eg.
>>> s = "Español" >>> re.findall(pattern, s) ['sp', 'ñ', 'l'] >>> re.findall(pattern, s, re.ASCII) ['sp', 'l']