Adam Hughes Adam Hughes - 3 months ago 12
Python Question

Python regex similar expressions

I have a file with two different types of data I'd like to parse with a regex; however, the data is similar enough that I can't find the correct way to distinguish it.

Some lines in my file are of form:

AED=FRI
AFN=FRI:SAT
AMD=SUN:SAT


Other lines are of form

AED=20180823
AMD=20150914
AMD=20150921


The remaining lines are headers and I'd like to discard them. For example

[HEADER: BUSINESS DATE=20160831]


My solution attempt so far is to match first three capital letters and an equal sign,

r'\b[A-Z]{3}=\b'


but after that I'm not sure how to distinguish between dates (eg 20180823) and days (eg FRI:SAT:SUN).

The results I'd expect from these parsing functions:

Regex weekday_rx = new Regex(<EXPRESSION FOR TYPES LIKE AED=FRI>);
Regex date_rx = new Regex(<EXPRESSION FOR TYPES LIKE AED=20160816>);


weekdays = [weekday_rx.Match(line) for line in infile.read()]
dates = [date_rx.Match(line) for line in infile.read()]

Answer
r'\S*\d$' 

Will match all non-whitespace characters that end in a digit

Will match AED=20180823

r'\S*[a-zA-Z]$'

Matches all non-whitespace characters that end in a letter.

will match AED=AED=FRI AFN=FRI:SAT AMD=SUN:SAT

Neither will match

[HEADER: BUSINESS DATE=20160831]

This will match both

r'(\S*[a-zA-Z]$|\S*\d$)'

Replacing the * with the number of occurences you expect will be safer, the (a|b) is match a or match b