cardycakes cardycakes - 3 months ago 9
Python Question

Simple index on DATE and TIME columns

I have a CSV containing data that looks like this:

<DATE> <TIME> <OPEN> <LOW> <HIGH> <CLOSE>
2001-01-03 00:00:00 0.9507 0.9505 0.9509 0.9506
....
2015-05-13 02:00:00 0.9496 0.9495 0.9509 0.9505


I want to create an index on and but retain the two columns as normal columns so I can reference them.

As the data is stored in a CSV I am not sure how I could parse the 2 columns (DATE and TIME) into one before creating the dataframe.

I've had a look at many answers but they seems convoluted for what I am trying to do, and I have became convinced I am missing the simple solution

Context around what lead me to this:


The proper way I've identified setting a new value (for when I am computing rolling mean values) is:

df.set_value('index', 'column', value)


Because my index is currently just on date, referencing the index for a particular row (say, the 1st row) means many values are set instead of one

Answer

UPDATE:

In [170]: df = pd.read_csv('/path/to/file.csv', parse_dates={'TIMESTAMP': ['DATE','TIME']}).set_index('TIMESTAMP')

In [171]: df
Out[171]:
                       OPEN     LOW    HIGH   CLOSE
TIMESTAMP
2001-01-03 00:00:00  0.9507  0.9505  0.9509  0.9506
2001-01-03 01:00:00  0.9507  0.9505  0.9509  0.9506
2001-01-03 02:00:00  0.9507  0.9505  0.9509  0.9506
2001-01-03 03:00:00  0.9507  0.9505  0.9509  0.9506
2001-01-03 04:00:00  0.9507  0.9505  0.9509  0.9506
2001-01-04 00:00:00  0.9507  0.9505  0.9509  0.9506
2001-01-04 01:00:00  0.9507  0.9505  0.9509  0.9506
2001-01-04 02:00:00  0.9507  0.9505  0.9509  0.9506

In [172]: df.index.dtype
Out[172]: dtype('<M8[ns]')

OLD answer:

you can do it this way:

In [155]: df
Out[155]:
   a  b  c
0  0  0  3
1  1  2  0
2  2  2  3
3  1  0  0
4  1  3  2
5  4  0  1
6  2  0  3
7  2  1  2
8  3  3  4
9  0  0  3

In [156]: df.join(df.ix[:, :2], rsuffix='_idx').set_index((df.ix[:, :2].columns + '_idx').tolist())
Out[156]:
             a  b  c
a_idx b_idx
0     0      0  0  3
1     2      1  2  0
2     2      2  2  3
1     0      1  0  0
      3      1  3  2
4     0      4  0  1
2     0      2  0  3
      1      2  1  2
3     3      3  3  4
0     0      0  0  3

BUT, you don't really need it, because it's redundant - you still have your data in the index and can use it...