chandu chandu - 4 months ago 15
Python Question

pandas dataframe data retrieval

Here is my sample pandas data frame

icd_code from_date paid_amount
claim_id
CKEY-7724339 719.43 2015-09-26 300.09
CKEY-5008998 722.2 2015-04-21 11.65
CKEY-7896598 722 2015-02-23 17.19
CKEY-7758556 850.9 2014-03-13 414.02
CKEY-7749118 847.0 2012-07-18 4.42
CKEY-10383160 854.00 2015-06-16 751.68
CKEY-10678452 607.84 2015-07-07 11.13
CKEY-10734364 882.2 2015-07-22 5625.00
CKEY-3500566 307.89 2011-08-09 500.00
CKEY-10766667 344.1 2013-12-03 139.41


When I use .loc to retrieve, the output is as follows

$ indexed_data.loc['CKEY-10766667']
icd_code 344.1
from_date 2013-12-03
paid_amount 139.41
Name: CKEY-10766667, dtype: object

~~~~~~~~expected output ~~~~~~~~~~
CKEY-10766667 344.1 2013-12-03 139.41


Can someone point me what's wrong in the above code

Note : I have call data.set_index('claim_id') on original data set to created index on 'claim_id'.

Answer

Using the code below gave me the expected ouput:

$>>> indexed_data.loc[['CKEY-8369057']] 

Passing single value to .loc will return a Dataframe when multiple rows exist and a Series if only one row exists. But passing a list to .loc will always return a Dataframe.

Consider execution time into account, passing a list consumes more time than a single element, especially when the statement is inside a loop. Here is what I did to achieve better execution time

df = indexed_data.loc[x]
if type(df).__name__ == 'Series':
    df = df.to_frame().T

The above code makes sure that the we have a dataframe at the end of these three lines.