Argenis Rojas Argenis Rojas - 1 month ago 12
Python Question

Selectin Dataframe columns name from a csv file

I have a .csv to read into a DataFrame and the names of the columns are in the same .csv file in the previos rows. Usually I drop all the 'unnecesary' rows to create the DataFrame and then hardcode the names of each dataframe

Trigger time,2017-07-31,10:45:38
CH,Signal name,Input,Range,Filter,Span
CH1, "Tin_MIX_Air",TEMP,PT,Off,2000.000000,-200.000000,degC
CH2, "Tout_Fan2b",TEMP,PT,Off,2000.000000,-200.000000,degC
CH3, "Tout_Fan2a",TEMP,PT,Off,2000.000000,-200.000000,degC
CH4, "Tout_Fan1a",TEMP,PT,Off,2000.000000,-200.000000,degC


Here you can see the rows where the columns names are in double quotes "TinMix","Tout..",etc there are exactly 16 rows with names

Logic/Pulse,Off
Data
Number,Date&Time,ms,CH1,CH2,CH3,CH4,CH5,CH7,CH8,CH9,CH10,CH11,CH12,CH13,CH14,CH15,CH16,CH20,Alarm1-10,Alarm11-20,AlarmOut
NO.,Time,ms,degC,degC,degC,degC,degC,degC,%RH,%RH,degC,degC,degC,degC,degC,Pa,Pa,A,A1234567890,A1234567890,A1234
1,2017-07-31 10:45:38,000,+25.6,+26.2,+26.1,+26.0,+26.3,+25.7,+43.70,+37.22,+25.6,+25.3,+25.1,+25.3,+25.3,+0.25,+0.15,+0.00,LLLLLLLLLL,LLLLLLLLLL,LLLL


And here the values of each variables start.

What I need to do is create a Dataframe from this .csv and place these names in the columns names. I'm new to Python and I'm not very sure how to do it

import pandas as pd
path = r'path-to-file.csv'
data=pd.DataFrame()
with open(path, 'r') as f:
for line in f:
data = pd.concat([data, pd.DataFrame([tuple(line.strip().split(','))])], ignore_index=True)
data.drop(data.index[range(0,29)],inplace=True)
x=len(data.iloc[0])
data.drop(data.columns[[0,1,2,x-1,x-2,x-3]],axis=1,inplace=True)
data.reset_index(drop=True,inplace=True)
data = data.T.reset_index(drop=True).T
data = data.apply(pd.to_numeric)


This is what I've done so far to get my dataframe with the usefull data, I'm dropping all the other columns that arent useful to me and keeping only the values. Last three lines are to reset row/column indexes and to transform the whole df to floats. What I would like is to name the columns with each of the names I showed in the first piece of coding as a I said before I'm doing this manually as:

data.columns = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p']


But I would like to get them from the .csv file since theres a possibility on changing the CH# - "Name" combination

Thank you very much for the help!

Answer Source

Comment: possible for it to work within the other "OPEN " loop that I have?

Assume Column Names from Row 2 up to 6, Data from Row 7 up to EOF.
For instance (untested code)

data = None
columns = []

with open (path) as fh:
   for row, line in enumerate (fh, 1):
       if row > 2 and row <= 6:
            ch, name = line.split(',')[:2]
            columns.append(name)
        else:
            row_data = [tuple(line.strip().split(','))]
            if not data:
                data = pd.DataFrame(row_data, columns=columns, ignore_index=True)
            else:
                data.append(row_data)

Question: ... I would like to get them from the .csv file

Start with:

with open (path) as fh:
   for row, line in enumerate (fh, 1):
       if row > 2:
            ch, name = line.split(',')[:2]