technazi technazi - 9 months ago 39
Python Question

Python regex iteration for all combinations

I am new to regex. I am using Python 2.7 and BeautifulSoup4. I want to iterate over a particular regular expression.

Required ouput :


length : 5 , expression : [a-zA-Z0-9!&#%@]



It should try all possible combinations e.g:

['aaaaa','aaaab','aaaac',...,'aaaaz','aaaaA',...,'aaaaZ','aaaa0','aaaa9','aaaa!','AAA!!']



Moreover this should be possible too. If the expression is orange\d{1}


['orangea','oranges']]


I tried this:


regexInput = "a-z0-9"
#regexInput = "a-zA-Z0-9!@#$%^&"
comb = itertools.permutations(regexInput,passLength)
for x in comb:
''.join(x)


I realized that this is a totally wrong approach as these are just permutations. Please help. Sorry for bad explaination, very frustrated.

Answer Source

Itertools functions for permutations or combinaisons takes a series of elements as first parameter. It cannot generate the serie for you (from a-z to abc...xyz). Fortunatly string offer some constants like ascii_letters that contain a-zA-Z.

If your goal is to interprete the regex and generate every case, ... It's pretty hard and you should explain the why? before we go further.

If you just want to get combinaisons for alphabetical letters:

import string
from itertools import combinations_with_replacement

result = combinations_with_replacement(string.ascii_letters, 5)

#comb = [''.join(n) for n in result] # warning, heavy processing

print [''.join(result.next()) for _ in range(10)]
# > ['aaaaa', 'aaaab', 'aaaac', 'aaaad', 'aaaae', 'aaaaf', 'aaaag', 'aaaah', 'aaaai', 'aaaaj']

You can replace string.ascii_letters with any serie of characters.