Zach Zach - 1 month ago 8
Python Question

Maintaining Data Associations in a Dictionary with Multiple Lines per Key

I have a data set from a baseball team that I want to analyze in one of my first Python programming experiences (coming from C++). However, the dataset has a more complex structure than my previous simple examples that I'd like to know the best (most pythonic) way to capture. The main difficulty is that the each player can have multiple seasons, and I'd like all of those to be tied to the same key (the player's ID number) but to maintain their correlations with the seasons. An example player set in the database looks like:

ID year AB H 2B 3B HR
JimBob01 2009 100 27 3 1 1
JimBob01 2010 154 37 6 2 5
JimBob01 2011 123 36 8 0 3


I searched around SO and found that a dictionary is the way to go, since I have a hashable key name system. And it looks like I might want a list for each element in the dictionary? However, I'd like to be able to do something like:

print dict['JimBob01'][2009]


To see only the stats from 2009, as well as something like:

for year in dict['JimBob01']:
total_ab += year['AB']`


and I think a list will not give me that flexibility. I apologize if this is an overly simplistic question, I'm trying to adapt to the data structures available in Python.

Answer

Seems like you want a dictionary of dictionaries. Something like:

playerData = {
  'JimBob01': {
    '2009': ... // player data here
    '2010': ...
  }
}

You can then look up the data for a particular year as you want by doing playerData['JimBob01']['2009']

Depending on the size of your dataset and how often you need to run analysis, you might also want to look into a Sqlite database.

Comments