CaseyK CaseyK - 2 months ago 15
Python Question

Using str.startswith to access a dataframe slice

I have a dataframe that with temperature values over the years, What I want to do is put all the rows that are from year 2015 into a new dataframe. Currently, the Date column is an object type with the str format looking like this: YYYY-MM-DD

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

df = pd.read_csv("C:\\whatever\weather.csv")

weather_2015 = df.loc[df.Date == df.Date.str.startswith("2015"), :]

this is what the data looks like in the main data frame

NOTE: if I do something like

weather_2015 = df.loc[df.Date == "2015-02-03", :]

I get what I'd expect, dates only that match 2015-02-03

Answer Source

pd.Series.str.startswith returns a boolean mask, you don't need to compare it to df.Date again. You could just index with it directly:

weather_2015 = df[df.Date.str.startswith("2015")]

You don't even need .loc here.

Note that if you want to make changes on this slice, you might prefer a copy, in which case you should call df.copy:

weather_2015 = df[df.Date.str.startswith("2015")].copy()