srodriguex - 2 months ago 14
Python Question

# Create a 2D plot pixel grid based on a pandas series of lists

Suppose we have a pandas Series of lists where each list contains some characteristics described as strings like this:

``````0  ["A", "C", "G", ...]
1  ["B", "C", "H", ...]
2  ["A", "X"]
...
N  ["J", "K", ...]
``````

What would be the best/easiest way to plot a 2D pixel grid where the X axis is occurrence of the characteristic and the Y axis each sample in the series 0,1,2,..., N?

Edited on Sept 22 16:

It seems I haven't mentioned explicitly that the list of characteristics isn't necessarily of the same size for all observations. The observation 1 can have 4 characteristics, observation 2 can have no one, observation 3 can have 5 and so on. So, I can't transform them into a numpy array right away without preprocessing them in some way that the missing characteristics are filled in.

Since I already wrote the code for the image in my comment, and Ed seems to have the same interpretation of your question as I do, I'll go ahead and add my solution.

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import string

M, N = 100, 10
letters = list(string.ascii_uppercase)
data = np.random.choice(letters, (M, N))

df = pd.DataFrame(data)
# Get frequency of letters in each column using pd.value_counts
df_freq = df.apply(pd.value_counts).T

# Plot frequency dataframe with seaborn heatmap
ax = sns.heatmap(df_freq, linewidths=0.1, annot=False, cbar=True)
plt.show()
``````