O.rka - 8 days ago 5
Python Question

# Most efficient way to turn dictionary into symmetric/distance matrix in Pandas

I'm doing pairwise distance for something with a weird distance metric. I have a dictionary like

`{(key_A, key_B):distance_value}`
and I want to make a symmetric
`pd.DataFrame`
like a distance matrix.

What is the most efficient way to do this? I found one way but it doesn't seem like the best way to do this. Is there anything in
`NumPy`
or
`Pandas`
that does this type of operation? or just a quicker way? My way is
`1.46 ms per loop`

``````np.random.seed(0)
D_pair_value = dict()
for pair in itertools.combinations(list("ABCD"),2):
D_pair_value[pair] = np.random.randint(0,5)
D_pair_value
# {('A', 'B'): 4,
#  ('A', 'C'): 0,
#  ('A', 'D'): 3,
#  ('B', 'C'): 3,
#  ('B', 'D'): 3,
#  ('C', 'D'): 1}
D_nested_dict = defaultdict(dict)
for (p,q), value in D_pair_value.items():
D_nested_dict[p][q] = value
D_nested_dict[q][p] = value

# Fill diagonal with zeros
DF = pd.DataFrame(D_nested_dict)
np.fill_diagonal(DF.values, 0)
DF
``````

You can use `scipy.spatial.distance.squareform`, which converts a vector of distance computations, i.e. `[d(A,B), d(A,C), ..., d(C,D)]`, into the distance matrix you're looking for.

Method 1: Distances Stored in a List

If you're computing your distances in order, like in your example code and in my example distance vector, I'd avoid using a dictionary and just store the results in a list, and do something like:

``````from scipy.spatial.distance import squareform

df = pd.DataFrame(squareform(dist_list), index=list('ABCD'), columns=list('ABCD'))
``````

Method 2: Distances Stored in a Dictionary

If you're computing things out of order and a dictionary is required, you just need to get a distance vector that's properly sorted:

``````from scipy.spatial.distance import squareform

dist_list = [dist[1] for dist in sorted(D_pair_value.items())]
df = pd.DataFrame(squareform(dist_list), index=list('ABCD'), columns=list('ABCD'))
``````

Method 3: Distances Stored in a Sorted Dictionary

If a dictionary is required, note that there's a package called `sortedcontainers` which has a `SortedDict` that essentially would solve the sorting issue for you. To use it, all you'd need to change is initializing `D_pair_value` as a `SortedDict()` instead of a `dict`. Using your example setup:

``````from scipy.spatial.distance import squareform
from sortedcontainers import SortedDict

np.random.seed(0)
D_pair_value = SortedDict()
for pair in itertools.combinations(list("ABCD"),2):
D_pair_value[pair] = np.random.randint(0,5)

df = pd.DataFrame(squareform(D_pair_value.values()), index=list('ABCD'), columns=list('ABCD'))
``````

The Resulting Output for Any Method Above:

``````     A    B    C    D
A  0.0  4.0  0.0  3.0
B  4.0  0.0  3.0  3.0
C  0.0  3.0  0.0  1.0
D  3.0  3.0  1.0  0.0
``````