srodriguex srodriguex - 6 months ago 37
Python Question

Create a 2D plot pixel grid based on a pandas series of lists

Suppose we have a pandas Series of lists where each list contains some characteristics described as strings like this:

0 ["A", "C", "G", ...]
1 ["B", "C", "H", ...]
2 ["A", "X"]
N ["J", "K", ...]

What would be the best/easiest way to plot a 2D pixel grid where the X axis is occurrence of the characteristic and the Y axis each sample in the series 0,1,2,..., N?

Edited on Sept 22 16:

It seems I haven't mentioned explicitly that the list of characteristics isn't necessarily of the same size for all observations. The observation 1 can have 4 characteristics, observation 2 can have no one, observation 3 can have 5 and so on. So, I can't transform them into a numpy array right away without preprocessing them in some way that the missing characteristics are filled in.


Since I already wrote the code for the image in my comment, and Ed seems to have the same interpretation of your question as I do, I'll go ahead and add my solution.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import string

M, N = 100, 10
letters = list(string.ascii_uppercase)
data = np.random.choice(letters, (M, N))

df = pd.DataFrame(data)
# Get frequency of letters in each column using pd.value_counts
df_freq = df.apply(pd.value_counts).T

# Plot frequency dataframe with seaborn heatmap
ax = sns.heatmap(df_freq, linewidths=0.1, annot=False, cbar=True)

enter image description here