Dhruv Ghulati Dhruv Ghulati - 5 months ago 23
Python Question

Creating a tripled nested JSON in Python

I am trying to create a JSON via first making a python dict that ultimately produces the following structured format:

{"sentences": [{"sentence": "At the end of November 2005 , Hong Kong and America had 132 licensed banks , 41 restricted licensed banks , 35 deposit-taking institutions , and 86 representative offices .","parsedSentence": "xyz in text e.g. At the end of November 2005 , LOCATION_SLOT and LOCATION_SLOT had NUMBER_SLOT licensed banks , NUMBER_SLOT restricted licensed banks , NUMBER_SLOT deposit-taking institutions , and NUMBER_SLOT representative offices .","location-value-pairs": [{"America": 132}, {"America": 41}, {"America": 35},
{"Hong Kong": 132}, {"Hong Kong": 41}, {"Hong Kong": 35}]}]}


However I can't seem to create this code of 2 nested keys, and then a third key of keys, each of the keys having an array.

My current code structure is the following (note, I couldn't get the keys like "sentence", "parsedSentence" etc to be created). Note I have no key variables (my keys are the strings themselves) which I want to move out of so that in future I can traverse this python dictionary quicker:

for sentence in parsedSentences:
wordsInSentence = []
for token in sentence["tokens"]:
wordsInSentence.append(token["word"])
sentence = " ".join(wordsInSentence)
for locationTokenIDs, location in tokenIDs2location.items():
for numberTokenIDs, number in tokenIDs2number.items():
if sentence not in sentences2location2values:
sentences2location2values[sentence] = {}
if location not in sentences2location2values[sentence]:
sentences2location2values[sentence][location] = []
sentences2location2values[sentence][location].append(number)

with open(outputFile, "wb") as out:
json.dump(sentences2location2values, out)


This gives me a JSON looking like this:

{"Mobutu Sese Seku seized power in 1965 via a coup , renaming the country Zaire , and reigning for the next 32 years as head of a ruthless and corrupt dictatorship .": {"Zaire": [32.0]}, "\u00c3 cents \u00c2 $ \u00c2 cents Movement for the Liberation of the Congo -LRB- MLC -RRB- : Under the direction of Bemba , and backed by Uganda , the MLC was formed in 1998 with 154 soldiers .": {"Congo": [154.0], "Uganda": [154.0]}, ...


Which doesn't get me to the structure I need.

How can I have a solution that essentially allows me to fill in the right keys and values one by one at the right parts of the loop, and is not just a one line solution?

Answer

It seems like there's somewhat of a mismatch between the ideal output at the beginning of your question, and what the code actually does, in that the code doesn't create the keys sentence, parsedSentence and location-value-pairs.

This may just mean I've misunderstood the question, but if not, you could try something like:

output = {"sentences": []}

for sentence in parsedSentences:

    sentenceDict = {"parsedSentence": sentence}

    wordsInSentence = []
    for token in sentence["tokens"]:
         wordsInSentence.append(token["word"])
    sentence = " ".join(wordsInSentence)

    sentenceDict["sentence"] = sentence

    sentenceDict["location-value-pairs"] = []

    for locationTokenIDs, location in tokenIDs2location.items():
        for numberTokenIDs, number in tokenIDs2number.items():
            sentenceDict["location-value-pairs"].append({location: number})

    output["sentences"].append(sentenceDict)