user1848018 user1848018 - 4 years ago 111
Python Question

How to use re to find consecutive space delimited single characters for each item in a large list

Let say I have the following list

['Y M C A','cambridge m a','d m v office','t mobile']


and want to convert it to

['YMCA','cambridge ma','dmv office','t mobile']


that is to detect all consecutive single characters followed by single space of different lengths ( greater than two). For example, the item
'd m v office'
, we should detect
**'d m v'**
and convert it to
**'dmv'**
but would leave
't mobile store'
intact (only one single character).

I know I could loop through the list, split each item by space and look for single character items but does not sound very efficient. Is it possible to do it using regex and module re? Once again the consecutive patterns could be of any length, greater than 1.

Answer Source

The following should work:

import re

def trim_match_spaces(matchObj):
    return ''.join(matchObj.group(0).split())

templist = ['Y M C A', 'cambridge m a', 'd m v office', 't mobile', 'cambridge m a is far from the sun']

for index, word in enumerate(templist):
    templist[index] = re.sub(r'(\b(\w\s)+\w\b)', trim_match_spaces, word)

print templist

This prints

['YMCA', 'cambridge ma', 'dmv office', 't mobile', 'cambridge ma is far from the sun']
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download