gluuke - 1 year ago 195

Python Question

I am looking for a way to include a (matplotlib) legend that describe the size of points in a scatter plot, as this could be related to another variable, like in this basic example:

`import numpy as np`

import matplotlib.pyplot as plt

N = 50

x = np.random.rand(N)

y = np.random.rand(N)

a2 = 400*np.random.rand(N)

plt.scatter(x, y, s=a2, alpha=0.5)

plt.show()

(inspired from:

http://matplotlib.org/examples/shapes_and_collections/scatter_demo.html)

so in the legend there would be ideally few spots corresponding to sizes 0-400 (the

`a2`

`s`

`scatter`

Answer

The solution below used `pandas`

to group the sizes together into set bins (with `groupby`

). It plots each group and assigns it a label and a size for the markers. I have used the binning recipe from this question.

**Note** this is slightly different to your stated problem as the marker sizes are binned, this means that two elements in `a2`

, say 36 and 38, will have the same size as they are within the same binning. You can always increase the number of bins to make it finer as suits you.

Using this method you could vary other parameters for each bin, such as the marker shape or colour.

```
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
N = 50
M = 5 # Number of bins
x = np.random.rand(N)
y = np.random.rand(N)
a2 = 400*np.random.rand(N)
# Create the DataFrame from your randomised data and bin it using groupby.
df = pd.DataFrame(data=dict(x=x, y=y, a2=a2))
bins = np.linspace(df.a2.min(), df.a2.max(), M)
grouped = df.groupby(np.digitize(df.a2, bins))
# Create some sizes and some labels.
sizes = [50*(i+1.) for i in range(M)]
labels = ['Tiny', 'Small', 'Medium', 'Large', 'Huge']
for i, (name, group) in enumerate(grouped):
plt.scatter(group.x, group.y, s=sizes[i], alpha=0.5, label=labels[i])
plt.legend()
plt.show()
```