I'm brand new to python and machine learning, and as part of my course at university we're using numpy, matplotlib, and sci-kit learn. Ok so I have a question. The code below works perfectly fine, my issue is that I don't really understand what's happening. So for this one:
#first two features are sepal length and sepal width
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Paired)
#here's also how to plot in 3d:
from mpl_toolkits.mplot3d import Axes3D #
#create a new figure
fig = plt.figure(figsize=(5,5))
#this creates a 1x1 grid (just one figure), and now we are plotting
#subfigure 1 (this is what 111 means)
ax = fig.add_subplot(111, projection='3d')
#plot first three features in a 3d Plot. Using : means that we take all
#elements in the correspond array dimension
ax.scatter(X[:, 0], X[:, 1], X[:, 2],c=Y)
Lets go step by step through your questions.
First, "why are there two X coordinates?": The
scatter function (http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter) takes
y coordinates for the markers; in this case both the
y coordinates are stored in one single 2D array called
X[:,0] are simply the
x coordinates, and
y coordinates. "what does c=Y mean?": There are several options for colouring the markers:
c can be a single color format string, or a sequence of color specifications of length N, or a sequence of N numbers to be mapped to colors using the cmap and norm specified via kwargs (see below). Note that c should not be a single numeric RGB or RGBA sequence because that is indistinguishable from an array of values to be colormapped. c can be a 2-D array in which the rows are RGB or RGBA, however, including the case of a single row to specify the same color for all points.
So in this case, the relevant part seems to be "N numbers to be mapped to colors using the cmap", i.e. you provide data values, and
scatter colours the markers using some color map (http://matplotlib.org/examples/color/colormaps_reference.html).
"fig.add_subplot(111, projection = '3d'). This third 1 doesn't really make sense to me.": The
add_subplot adds sub plots on a grid, where
111 (or lets use
322 as a more clear example) means a grid with 3 rows, 2 columns, and you are using the second position on the grid (i.e. row 0, column 1) as the current subplot. So
subplot(111) simply means a figure with 1 row and 1 column of sub plots, and you are using (what a surprise..) the first and only position.
"Why are there three X's and no cmap?": You first imported
mpl_toolkits.mplot3d and next specified
add_subplot, so you are now creating a 3D
scatter plot, which requires the specification of
z coordinates of the markers.