If this question is unclear, I am very open to constructive criticism.
I have an excel table with about 50 rows of data, with the first column in each row being a date. I need to access all the data for only one date, and that date appears only about 1-5 times. It is the most recent date so I've already organized the table by date with the most recent being at the top.
So my goal is to store that date in a variable and then have Python look only for that variable (that date) and take only the columns corresponding to that variable. I need to use this code on 100's of other excel files as well, so it would need to arbitrarily take the most recent date (always at the top though).
My current code below simply takes the first 5 rows because I know that's how many times this date occurs.
from numpy import genfromtxt
import pandas as pd
path = 'Z:\\folderwithcsvfile'
for filename in os.listdir(path):
file_path = os.path.join(path, filename)
broken_df = pd.read_csv(file_path)
df3 = broken_df['DATE']
df4 = broken_df['TRADE ID']
df5 = broken_df['AVAILABLE STOCK']
df6 = broken_df['AMOUNT']
df7 = broken_df['SALE PRICE']
This is a relatively simple filtering operation. You state that you want to "take only the columns" that are the latest date, so I assume that an acceptable result will be a filter
DataFrame with just the correct columns.
Here's a simple CSV that is similar to your structure:
DATE,TRADE ID,AVAILABLE STOCK 10/11/2016,123,123 10/11/2016,123,123 10/10/2016,123,123 10/9/2016,123,123 10/11/2016,123,123
Note that I mixed up the dates a little bit, because it's hacky and error-prone to just assume that the latest dates will be on the top. The following script will filter it appropriately:
import pandas as pd import numpy as np df = pd.read_csv('data.csv') # convert the DATE column to datetimes df['DATE'] = pd.to_datetime(df['DATE']) # find the latest datetime latest_date = df['DATE'].max() # use index filtering to only choose the columns that equal the latest date latest_rows = df[df['DATE'] == latest_date] print (latest_rows) # now you can perform your operations on latest_rows
In my example, this will print:
DATE TRADE ID AVAILABLE STOCK 0 2016-10-11 123 123 1 2016-10-11 123 123 4 2016-10-11 123 123