cemaivaz cemaivaz - 2 months ago 6
Python Question

Python version of Java regular expression?

I am a Java developer, and new to Python. I would like to define a regex accepting all the alphabetic characters except for some of them. I want to exclude just the vowels and the character 'y', be it in upper- or lowercase.

The regex in Java for it would be as follows:

"[a-zA-Z&&[^aeiouyAEIOUY]]"


How can I (re)define it as in Python? The above doesn't work for Python, obviously. And I also would NOT like the following pattern to be suggested:

"[bcdfghjklmnpqrstvwxzBCDFGHJKLMNPQRSTVWXZ]"

Answer

I don't think the current python regular expression module has exactly what you're looking for. The eventual replacement regex does have what you need, and you can install it should you wish.

Other than that, a negation might be the way to go. Basically, define all the characters you don't want and then invert that. Sounds labourious, but the "not-word" shorthand (\W) can help us out. \w means a-zA-Z0-9_ (for ASCII matches), and \W means the opposite ([^\w]). Thus, [aeiouyAEIOUY\W\d_] means every character which you are not looking for, and so [^aeiouyAEIOUY\W\d_] means every character you are looking for. eg.

>>> import re
>>> s = "xyz_ d10 word"
>>> pattern = "[^aeiouyAEIOUY\W\d_]+"
>>> re.findall(pattern, s)
['x', 'z', 'd', 'w', 'rd']

If you are strictly after only ASCII characters then you can use the ASCII flag. eg.

>>> s = "Español"
>>> re.findall(pattern, s)
['sp', 'ñ', 'l']
>>> re.findall(pattern, s, re.ASCII)
['sp', 'l']