Emily Emily - 3 months ago 14
Python Question

Find all items in a list that match a specific format

I am trying to find everything in a list that has an format like "######-##"

I thought I had the right idea in my following code, but it isn't printing anything. Some values in my list have that format, and I would think it should print it. Could you tell me what's wrong?

for line in list_nums:
if (line[-1:].isdigit()):
if (line[-2:-1].isdigit()):
if (line[-6:-5].isdigit()):
if ("-" in line[-3:-2]):
print(list_nums)


The values in my list consist of formats like 123456-56 and 123456-98-98, which is why what I did above.

Answer

If you need to only match the pattern ######-## (where # is a digit):

>>> from re import compile, match
>>> regexp = compile(r'^\d{6}-\d{2}$')
>>> print([line for line in list_nums if regexp.match(line)])
['132456-78']

Explanations

You compile the pattern into a regexp object to be more efficient when matching. The regexp is ^\d{6}-\d{2}$ where:

^  # start of the line
\d{6}-\d{2}  # 6 digits, one dot then 2 digits (we could replace \d by [0-9])
$  # end of the line

Full code

An example based on your comment:

file_location = 'file.xlsx'
workbook = xlrd.open_workbook(file_location)
sheet = workbook.sheet_by_index(0)
regexp = compile(r'^\d{6}-\d{2}$')

for row in range(sheet.nrows):
    cell = sheet.cell_value(row, 0)
    if regexp.match(cell):
        print(cell)