Sanchit Aluna Sanchit Aluna - 7 months ago 8
Python Question

Need to create upper column for many lower columns and filling the existing data in it

I want to do multiple indexing on my existing columns. below are examples.

I want to give a column name to existing 5 columns and another name to another existing columns(multindexing)

I have almost done this but when i try to fill the data in blank file then it reads the columns name from row 1 but actual columns exists in row 2 as first row is for columns name defined for actual columns in row 2. You can find the attached image to understand the kind of output i want.
below is my code.

comp= upper column name for [com1','Com2','Com3','Com4']

code= code1
InputData= my exising data where all above(lower column) columns exists

InputDatafeatures = [['code','com','com','com','com','somp','somp'], ['code1','Com1','Com2','Com3','Com4','nam1','nam2']]
tuples = zip(*InputDatafeatures)
index = pd.MultiIndex.from_tuples(tuples)
InputData= DataFrame(InputData, columns=index)


I am able to create the upper and lower columns but when i fill the existing data which already has same lower columns does not get filled in it and if i try to fill then code reads the columns which exists in row 1(upper) and creates new columns beside these. any help is deeply appreciated. Please let me know if you need any other info or unable to understand my explanation. reference link second code
below images shows how my current data is and how i want my data to be .

raw data



code com1 com2 com3 com4 nam1 nam2
1300079-DE 783000 1200 103000 235000 H 2D
1300079-DE 1610000 151000 88000 201000 H 2D
1300079-DE 780000 88100 51400 117000 H 2D
1300185-DE 57900 6480 6390 7910 H 2D
1300560-DE 60400 15700 17800 17400 H 2D
1301011-DE 23400 10800 3940 14500 H 2D
1301644-DE 11700 5420 1670 7230 H 2D
1301907-DE 192000 294 57800 57700 H 2D
1301907-DE 2140000 163000 192000 217000 H 2D


wrong outputcorrect output

Answer

I think you can simply change columns to MultiIndex by df.columns = columns:

print df
         code     com1    com2    com3    com4 nam1 nam2
0  1300079-DE   783000    1200  103000  235000    H   2D
1  1300079-DE  1610000  151000   88000  201000    H   2D
2  1300079-DE   780000   88100   51400  117000    H   2D
3  1300185-DE    57900    6480    6390    7910    H   2D
4  1300560-DE    60400   15700   17800   17400    H   2D
5  1301011-DE    23400   10800    3940   14500    H   2D
6  1301644-DE    11700    5420    1670    7230    H   2D
7  1301907-DE   192000     294   57800   57700    H   2D
8  1301907-DE  2140000  163000  192000  217000    H   2D


InputDatafeatures = [['code','com','com','com','com','somp','somp'],
                     ['code1','Com1','Com2','Com3','Com4','nam1','nam2']]
tuples = zip(*InputDatafeatures)
cols = pd.MultiIndex.from_tuples(tuples)
print cols
MultiIndex(levels=[[u'code', u'com', u'somp'], 
                   [u'Com1', u'Com2', u'Com3', u'Com4', u'code1', u'nam1', u'nam2']],
           labels=[[0, 1, 1, 1, 1, 2, 2], [4, 0, 1, 2, 3, 5, 6]])

df.columns = cols
print df
         code      com                         somp     
        code1     Com1    Com2    Com3    Com4 nam1 nam2
0  1300079-DE   783000    1200  103000  235000    H   2D
1  1300079-DE  1610000  151000   88000  201000    H   2D
2  1300079-DE   780000   88100   51400  117000    H   2D
3  1300185-DE    57900    6480    6390    7910    H   2D
4  1300560-DE    60400   15700   17800   17400    H   2D
5  1301011-DE    23400   10800    3940   14500    H   2D
6  1301644-DE    11700    5420    1670    7230    H   2D
7  1301907-DE   192000     294   57800   57700    H   2D
8  1301907-DE  2140000  163000  192000  217000    H   2D

EDIT:

You can use values df.columns for creating Multiindex:

InputDatafeatures = [['code','com','com','com','com','somp','somp'], df.columns]
tuples = zip(*InputDatafeatures)
cols = pd.MultiIndex.from_tuples(tuples)
print cols
MultiIndex(levels=[[u'code', u'com', u'somp'], 
                   [u'code', u'com1', u'com2', u'com3', u'com4', u'nam1', u'nam2']],
           labels=[[0, 1, 1, 1, 1, 2, 2], [0, 1, 2, 3, 4, 5, 6]])

df.columns = cols
print df
         code      com                         somp     
         code     com1    com2    com3    com4 nam1 nam2
0  1300079-DE   783000    1200  103000  235000    H   2D
1  1300079-DE  1610000  151000   88000  201000    H   2D
2  1300079-DE   780000   88100   51400  117000    H   2D
3  1300185-DE    57900    6480    6390    7910    H   2D
4  1300560-DE    60400   15700   17800   17400    H   2D
5  1301011-DE    23400   10800    3940   14500    H   2D
6  1301644-DE    11700    5420    1670    7230    H   2D
7  1301907-DE   192000     294   57800   57700    H   2D
8  1301907-DE  2140000  163000  192000  217000    H   2D

Thanks for the answer and it almost resolves my problem but upper column is getting repeated with every lower column. instead i want to look upper column something like 'merge and center' in python. my data looks like this after exporting to csv. please have a look at attached image.output