Jeff Jeff - 3 months ago 7
Python Question

Using Pandas str.contains to compare row-by-row

I have set up the following very simple database to illustrate what I'm trying to do:

teams = pd.DataFrame({"spreads":['New England Patriots -7.0','Atlanta Falcons 2.5','New Orleans Saints -4.5']})
teams['home'] = ['New England Patriots','Carolina Panthers','New Orleans Saints']
teams['away'] = ['Miami Dolphins','Atlanta Falcons','Tampa Bay Buccaneers']


I'm essentially trying to extract the spread value. At first I was trying to use str.contains to first extract the team name thus separating out the numeric value but it seems that I can't use that as a comparison tool for a row-by-row analysis. If anyone has any tips for how to extract the numeric value (I don't think I can use a regex because there are cases where no '-' sign appears) or at the very least tell me what methodology to use to determine if the team listed for each row is the home or away team I would greatly appreciate it.

Answer

Use .str.extract

teams.spreads.str.extract(r'(-?\d+\.?\d*)', expand=False)

0    -7.0
1     2.5
2    -4.5
Name: spreads, dtype: object

Fancier

teams.spreads.str.extract(r'(?P<spread_val>-?\d+\.?\d*)', expand=True)

enter image description here