Dan ran Dan ran - 5 months ago 27
JSON Question

creating a empty JSON file and uptaing it by pandas dataframe's row on python

So, I have a pandas dataframe with large no. of rows. whose one row might look like this and more.

data_store.iloc[0]
Out[5]:
mac_address 00:03:7f:05:c0:06
Visit 1/4/2016
storeid Ritika - Bhubaneswar
Mall or not High Street
Address 794, Sahid Nagar, Janpath, Bhubaneswar-751007
Unnamed: 4 OR
Locality JanPath
Affluence Index Locality 4
Lifestyle Index 5
Tourist Attraction In the locality? 0
City Bhubaneswar
Pop Density City 2131
Population Density of City NaN
City Affluence Index Medium
Mall / Shopping Complex High Street
Mall Premiumness Index NaN
Multiplex NaN
Offices Nearby NaN
Food Court NaN
Average Footfall NaN
Average Rental NaN
Size of the mall NaN
Area NaN
Upscale Street NaN
Place of Worship in vicinity NaN
High Street / Mall High Street
Brand Premiumness Index 4
Restaurant Nearby? 0
Store Size Large
Area.1 2600


There may be some more value in place of Nan just take it as a example.Now the unique key here is mac_address so I want to start with a empty JSON document. for that I will start with a dictionary in python which later I can dump into JSON . I dont know how to start with empty dict(you can help with that too) so, I have started with one value. Now for each new row of data frame if the mac_address which is also is the index of dict if mac_id is there update the corresponding city and city count. And if it is not there add a new field(if it is called field) with index as the new mac_address and store the value accordingly.

data = {"00:08:22:24:f8:02": {
"mac_address" : "00:08:22:24:f8:02",
"cities" :
[
{'name': 'Bhubaneswar', 'count': 12},
{'name': 'Kolkata', 'count': 4},
{'name': 'Mumbai', 'count': 6}
]
}

}


city count is no. of times a mac_address visited to a city. By reading this particular row I would like to update a city named Bhubneswar and Count 1.
This is the output of
print(data_store.head(10)[['mac_address','City']]


mac_address City
0 00:03:7f:05:c0:06 Kolkata
1 00:08:22:1c:50:07 Bhubaneswar
2 00:08:22:1c:50:07 Mumbai
3 00:08:22:1c:50:07 Mumbai
4 00:08:22:1c:50:07 Kolkata
5 00:08:22:24:cc:fb Bhubaneswar
6 00:08:22:24:f8:02 Mumbai
7 00:08:22:24:f8:02 Kolkata
8 00:08:22:24:f8:02 Mumbai
9 00:08:22:24:f8:02 Bhubaneswar

Answer

your desired JSON file is not a valid JSON file, here is the error message thrown by the online JSON validator:

Input:

{
    "mac_address": "00:03:7f:05:c0:06" {
        "cities": [{
            "City Name1": "Wittenbergplatz",
            "City count": "12"
        }, {
            "City Name2": "Spichernstrasse",
            "City Count": "19"
        }, {
            "City Name3": "Weberwiese",
            "City count": "30"
        }]
    }
}

Error

Error: Parse error on line 2:
..."00:03:7f:05:c0:06" {        "cities": [{            
-----------------------^
Expecting 'EOF', '}', ':', ',', ']', got '{'

this solution might help you to start:

In [440]: (df.groupby(['mac_address','City'])
   .....:    .size()
   .....:    .reset_index()
   .....:    .rename(columns={0:'count'})
   .....:    .groupby('mac_address')
   .....:    .apply(lambda x: x[['City','count']].to_dict('r'))
   .....:    .to_dict()
   .....: )
Out[440]:
{'00:03:7f:05:c0:01': [{'City': 'aaa', 'count': 1}],
 '00:03:7f:05:c0:02': [{'City': 'bbb', 'count': 1}],
 '00:03:7f:05:c0:03': [{'City': 'ccc', 'count': 2}],
 '00:03:7f:05:c0:05': [{'City': 'xxx', 'count': 1},
  {'City': 'zzz', 'count': 1}],
 '00:03:7f:05:c0:06': [{'City': 'aaa', 'count': 1},
  {'City': 'bbb', 'count': 1}],
 '00:03:7f:05:c0:07': [{'City': 'aaa', 'count': 3},
  {'City': 'bbb', 'count': 1}]}

data:

In [441]: df
Out[441]:
          mac_address City
0   00:03:7f:05:c0:06  aaa
1   00:03:7f:05:c0:06  bbb
2   00:03:7f:05:c0:07  aaa
3   00:03:7f:05:c0:07  bbb
4   00:03:7f:05:c0:07  aaa
5   00:03:7f:05:c0:01  aaa
6   00:03:7f:05:c0:02  bbb
7   00:03:7f:05:c0:03  ccc
8   00:03:7f:05:c0:03  ccc
9   00:03:7f:05:c0:07  aaa
10  00:03:7f:05:c0:05  xxx
11  00:03:7f:05:c0:05  zzz