Vik G Vik G - 5 months ago 12
Python Question

re.sub python to gather height

I am writing a python program to parse some user data from a txt file.
One of the rows in the text file will contain the user's height.
I have specified an order that the user is expected to follow like

First line of the file should contain name, the next line, date of birth,
3rd line, height etc.

I have also given a sample file to the user which looks like this

Name: First Name Last Name

DOB: 16.04.2000

Age: 16

Height: 5 feet 9 inch

When I read the file, I looked at each line and split it using ':' as a separator.

The first field is my column name like name, dob, age, height.

In some cases, users forget the ':' after Name or DOB, or they will simply send data like:


  • Height 5 feet 9 inch

  • 5 feet 9 inch

  • 5ft 9 in

  • 5feet 9inches



The logic I have decided to use is:


  1. Look for ':' on each line; if one is found, then I have my field.

  2. Otherwise, try to find out what data it could be.



The logic for height is like this:

if any(heightword in file_line.upper() for heightword in ['FT', 'HEIGHT', 'FEET', 'INCH', 'CM'])


This
if
condition will look for words associated with height.

Once I have determined that the line from the file contains the height, I want to be able to convert that information to inches before I write it to the database.

Please can someone help me work out how to convert the following data to inches.


  • Height 5 feet 9 inch

  • 5 feet 9 inch

  • 5ft 9 in

  • 5feet 9inches



I know since I am trying to cater to variety of user inputs. This list is not exhaustive; I am trying to use these as an example to understand, and then I will keep adding code if and when I find new patterns.

Answer

The simplest way is likely to be using a reference dict of unit keywords to their respective conversion factors, and a regex to extract the keyword and amount. The following short program takes some input and prints the total converted to inches.

import re
import string
h = 0
r = re.compile(r'(\d+)\s*(\w+)\b')
def incr( m ):
    h+=m.group(1)*({'in':1,'inches':1,'inch':1,'foot':12,'feet':12,'cm':0.3937,'centimeter':0.3937,'centimeters':0.3937}[string.lower(m.group(2))]||1) # etc. etc.
    return ''
re.sub(r, incr, input)
print h

You may also want to restrict the keywords usable to keep the dict from getting too big.