kmkmkmkmkm - 6 months ago 34

Python Question

I'm brand new to python and machine learning, and as part of my course at university we're using numpy, matplotlib, and sci-kit learn. Ok so I have a question. The code below works perfectly fine, my issue is that I don't really understand what's happening. So for this one:

`%matplotlib inline`

X=iris.data

Y=iris.target

#first two features are sepal length and sepal width

plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired)

plt.xlabel('Sepal length')

plt.ylabel('Sepal width')

I tried to check the documentation but it didn't really make any sense to me.

Here I would like to know what the arguments in plt.scatter() mean. I don't really understand it, what does c=Y mean, what is cmap why are there two X coordinates?

As for this next code:

`%matplotlib inline`

#here's also how to plot in 3d:

from mpl_toolkits.mplot3d import Axes3D #

#create a new figure

fig = plt.figure(figsize=(5,5))

#this creates a 1x1 grid (just one figure), and now we are plotting

#subfigure 1 (this is what 111 means)

ax = fig.add_subplot(111, projection='3d')

#plot first three features in a 3d Plot. Using : means that we take all

#elements in the correspond array dimension

ax.scatter(X[:, 0], X[:, 1], X[:, 2],c=Y)

What I want to know here is:

fig.add_subplot(111, projection = '3d'). This third 1 doesn't really make sense to me. I understand the 1x1 grid, but I don't understand "and now we are plotting subfigure 1".

Also:

The ax.scatter() arguments make no sense here either. Why is the format different than the one above? Why are there three X's and no cmap? I don't really understand it. Why do they not use plt.scatter()?

Answer

Lets go step by step through your questions.

First, *"why are there two X coordinates?"*: The `scatter`

function (http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter) takes `x`

and `y`

coordinates for the markers; in this case both the `x`

and `y`

coordinates are stored in one single 2D array called `X`

: `X[:,0]`

are simply the `x`

coordinates, and `X[:,1]`

the `y`

coordinates. *"what does c=Y mean?"*: There are several options for colouring the markers:

c can be a single color format string, or a sequence of color specifications of length N, or a sequence of N numbers to be mapped to colors using the cmap and norm specified via kwargs (see below). Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however, including the case of a single row to specify the same color for all points.

So in this case, the relevant part seems to be *"N numbers to be mapped to colors using the cmap"*, i.e. you provide data values, and `scatter`

colours the markers using some color map (http://matplotlib.org/examples/color/colormaps_reference.html).

*"fig.add_subplot(111, projection = '3d'). This third 1 doesn't really make sense to me."*: The `add_subplot`

adds sub plots on a grid, where `111`

(or lets use `322`

as a more clear example) means a grid with 3 rows, 2 columns, and you are using the second position on the grid (i.e. row 0, column 1) as the current subplot. So `subplot(111)`

simply means a figure with 1 row and 1 column of sub plots, and you are using (what a surprise..) the first and only position.

*"Why are there three X's and no cmap?"*: You first imported `mpl_toolkits.mplot3d`

and next specified `projection='3d'`

to `add_subplot`

, so you are now creating a 3D `scatter`

plot, which requires the specification of `x`

, `y`

and `z`

coordinates of the markers.