aysha aysha - 3 months ago 6
Python Question

Removing white space from txt with python

I have a .txt file (scraped as pre-formatted text from a website) where the data looks like this:

B, NICKOLAS CT144531X D1026 JUDGE ANNIE WHITE JOHNSON
ANDREWS VS BALL JA-15-0050 D0015 JUDGE EDWARD A ROBERTS


I'd like to remove all extra spaces (they're actually different number of spaces, not tabs) in between the columns. I'd also then like to replace it with some delimiter (tab or pipe since there's commas within the data), like so:

ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS


Looked around and found that the best options are using regex or shlex to split. Two similar scenarios:


Answer
s = """B, NICKOLAS                       CT144531X       D1026    JUDGE ANNIE WHITE JOHNSON  
ANDREWS VS BALL                   JA-15-0050      D0015    JUDGE EDWARD A ROBERTS
"""

# Update
re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
In [71]: print re.sub(r"(\S)\ {2,}(\S)(\n?)", r"\1|\2\3", s)
B, NICKOLAS|CT144531X|D1026|JUDGE ANNIE WHITE JOHNSON  
ANDREWS VS BALL|JA-15-0050|D0015|JUDGE EDWARD A ROBERTS
Comments