Catherine Catherine - 6 months ago 16
Python Question

Python: Using Regex to create an array of non-duplicate entries

I am writing a function to find the names of processes occurring on a system. I take in an array like this:

['\\\\TEST-PC\\Process(python)\\Operations/sec',
'\\\\TEST-PC\\Process(process#2)\\Operations/sec',
'\\\\TEST-PC\\Process(process#1)\\Operations/sec',
'\\\\TEST-PC\\Process(process)\\Operations/sec',
'\\\\TEST-PC\\Process(python)\\Thread Count',
'\\\\TEST-PC\\Process(process#2)\\Thread Count',
'\\\\TEST-PC\\Process(process#1)\\Thread Count',
'\\\\TEST-PC\\Process(process)\\Thread Count'....etc....]


and I want to output the names of each process in an array like this:

['python','process#2','process#1','process']


(Note that if a process come up more than once in the original array I do not want duplicates in the ouput array)

Here is what I have so far:

def count_no_of_processes(row_to_check):
#Ignore first entry
to_search= row_to_check[1:]
processes=[]
for number in range(0,len(header_to_search)):
search = re.search(r"\(([^)]+)\)", header_to_search[number])
processes.append(search
print processes


But this doesn't give me the a list of processes it just says
"<_sre.SRE_Match object at 0x10c1fw321>"
within the
"processes"
list.

What am I doing wrong?

I have yet to get to the stage or checking for duplications in the
processes
list but if any has any advice it would be appreciated as I am new to using Regex .

Jan Jan
Answer

You could come up with:

import re

processes = ['\\\\TEST-PC\\Process(python)\\Operations/sec',
'\\\\TEST-PC\\Process(process#2)\\Operations/sec', 
'\\\\TEST-PC\\Process(process#1)\\Operations/sec', 
'\\\\TEST-PC\\Process(process)\\Operations/sec', 
'\\\\TEST-PC\\Process(python)\\Thread Count', 
'\\\\TEST-PC\\Process(process#2)\\Thread Count',
'\\\\TEST-PC\\Process(process#1)\\Thread Count',
'\\\\TEST-PC\\Process(process)\\Thread Count']

rx = re.compile(r'Process\(([^)]+)\)')

processes_filtered = []
for process in processes:
    match = rx.search(process)
    if match is not None:
        if match.group(1) not in processes_filtered:
            processes_filtered.append(match.group(1))

print processes_filtered
# ['python', 'process#2', 'process#1', 'process']

See a demo on ideone.com.

Or - even shorter - with a list comprehension:

rx = re.compile(r'Process\(([^)]+)\)')
processes_filtered = set([m.group(1) \
    for process in processes \
    for m in [rx.search(process)] if m])
Comments