neo33 neo33 -4 years ago 115
JSON Question

How to overcome, the following issue parsing a json file?

Hello I have the following json:

j = """[
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
[
{
"created": "2017-02-01T22:19:12+0000",
"from": "Bank",
"message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459"
},
{
"created": "2017-02-01T16:22:30+0000",
"from": "Alex",
"message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
}

]


]"""


Since I need an specific structure I tried to parse it as follows:

js = json.loads(j)
df = pd.concat({i: pd.DataFrame(j) for i, j in enumerate(js)})

df.created = pd.to_datetime(df.created)

df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')).set_index(['created', 'qna']).message.unstack(fill_value='')


Everything is fine until this point but if I add another field with a repeated date I got the following error:

---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-5-5652e92adbdc> in <module>()
69 df['from'] = df['from'].str.strip()
70 df = df.drop_duplicates()
---> 71 df.assign(qna=np.where(df['from'] == 'Bank', 'Answer', 'Question')) .set_index(['created', 'qna']) .unstack()
72
73

/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py in unstack(self, level, fill_value)
4034 """
4035 from pandas.core.reshape import unstack
-> 4036 return unstack(self, level, fill_value)
4037
4038 # ----------------------------------------------------------------------

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in unstack(obj, level, fill_value)
406 if isinstance(obj, DataFrame):
407 if isinstance(obj.index, MultiIndex):
--> 408 return _unstack_frame(obj, level, fill_value=fill_value)
409 else:
410 return obj.T.stack(dropna=False)

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _unstack_frame(obj, level, fill_value)
449 unstacker = _Unstacker(obj.values, obj.index, level=level,
450 value_columns=obj.columns,
--> 451 fill_value=fill_value)
452 return unstacker.get_result()
453

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in __init__(self, values, index, level, value_columns, fill_value)
101
102 self._make_sorted_values_labels()
--> 103 self._make_selectors()
104
105 def _make_sorted_values_labels(self):

/usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _make_selectors(self)
139
140 if mask.sum() < len(self.index):
--> 141 raise ValueError('Index contains duplicate entries, '
142 'cannot reshape')
143

ValueError: Index contains duplicate entries, cannot reshape


I am trying with this new json but it is failing by the date, so I would like to receive support to overcome this task:

this is the json that is failing:

j = """[
[
{
"created": "2017-02-02T11:57:41+0000",
"from": "Bank",
"message": "Hi Alex, if you have not perform the modification to the data, please verify your DNI, celphone and the operator to verify it. Thanks."
},
{
"created": "2017-02-01T22:19:58+0000" ,
"from": "Alex ",
"message": "Could someone please help me?, I am callig to CC and they don't answer"
},
{
"created": "2017-02-01T22:19:42+0000",
"from": "Alex ",
"message": "the sms with the corresponding key and token has not arrived"
},
{
"created": "2017-02-01T22:19:28+0000",
"from": "Alex ",
"message": "I have issues to make payments from the app"
},
{
"created": "2017-02-01T22:19:18+0000",
"from": "Alex ",
"message": "Good afternoon"
}
],
[
{
"created": "2017-02-01T22:19:12+0000",
"from": "Bank",
"message": " Hello Alexander, the money is available to be withdrawn, you could go to any store the number is 70307002459"
},
{
"created": "2017-02-01T16:22:30+0000",
"from": "Alex",
"message": "hello they have deposited the money into my account, I don't have account from this bank, Could I know if I can withdraw the money? DNI 427 thanks a lot"
}

],
[
{
"created": "2017-02-01T22:19:13+0000",
"from": "Bank",
"message": " Hello Adolfo, the money is available."
},
{
"created": "2017-02-01T16:22:33+0000",
"from": "Omar",
"message": "hello they have deposited the money into my account."
}

]



]"""

Answer Source

Looks like you need to separate out the assign statement. No need for append=True.

js = json.loads(j)
df = pd.concat([pd.DataFrame(j) for j in js], ignore_index=True)
df['from'] = df['from'].str.strip()
df['created'] = pd.to_datetime(df.created)
df['qna'] = np.where(df['from'] == 'Bank', 'Answer', 'Question')
df1 = df.set_index(['created', 'qna']).unstack(fill_value='')

with pd.option_context('display.max_colwidth', 30, 'display.expand_frame_repr', False):
    print(df1)

Output

                       from                                 message                               
qna                 Answer Question                         Answer                       Question
created                                                                                          
2017-02-01 16:22:30            Alex                                 hello they have deposited ...
2017-02-01 22:19:12   Bank            Hello Alexander, the mone...                               
2017-02-01 22:19:18            Alex                                                Good afternoon
2017-02-01 22:19:28            Alex                                 I have issues to make paym...
2017-02-01 22:19:42            Alex                                 the sms with the correspon...
2017-02-01 22:19:58            Alex                                 Could someone please help ...
2017-02-02 11:57:41   Bank           Hi Alex, if you have not p...   
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download