Emily - 1 month ago 5
Python Question

Find all items in a list that match a specific format

I am trying to find everything in a list that has an format like "######-##"

I thought I had the right idea in my following code, but it isn't printing anything. Some values in my list have that format, and I would think it should print it. Could you tell me what's wrong?

``````for line in list_nums:
if (line[-1:].isdigit()):
if (line[-2:-1].isdigit()):
if (line[-6:-5].isdigit()):
if ("-" in line[-3:-2]):
print(list_nums)
``````

The values in my list consist of formats like 123456-56 and 123456-98-98, which is why what I did above.

If you need to only match the pattern `######-##` (where `#` is a digit):

``````>>> from re import compile, match
>>> regexp = compile(r'^\d{6}-\d{2}\$')
>>> print([line for line in list_nums if regexp.match(line)])
['132456-78']
``````

Explanations

You `compile` the pattern into a regexp object to be more efficient when matching. The regexp is `^\d{6}-\d{2}\$` where:

``````^  # start of the line
\d{6}-\d{2}  # 6 digits, one dot then 2 digits (we could replace \d by [0-9])
\$  # end of the line
``````

Full code

An example based on your comment:

``````file_location = 'file.xlsx'
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)
regexp = compile(r'^\d{6}-\d{2}\$')

for row in range(sheet.nrows):
cell = sheet.cell_value(row, 0)
if regexp.match(cell):
print(cell)
``````