Yasha Yasha - 3 months ago 10
Python Question

Python automatically generating variables


I have a question about Python creating new variables derived from other variables. I am struggling to understand how Python automatically knows how to generate variables even when I do not explicitly tell it to.


I am a new Python user, and am following along in the tutorials in: Joel Grus, "Data Science From Scratch".

In the tutorial, I create three list variables:

  1. friends
    contains the number of friends that someone has on a given
    social networking site

  2. minutes
    refers to the number of minutes that they spend on the site

  3. labels
    is simply an alphabetic label for each user.

Part of the tutorial is graphically plotting labels next to the points when I create a scatterplot. In doing so, Python seems to automatically generate three new variables:
, and

In short - how? How does Python know to create these variables? And what do they do? They do not correspond to the mean, median, or mode of any of the lists.


import matplotlib.pyplot as plt
from collections import Counter

def make_chart_scatter_plot(plt):

friends = [ 70, 65, 72, 63, 71, 64, 60, 64, 67]
minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190]
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']

plt.scatter(friends, minutes)

# label each point
for label, friend_count, minute_count in zip(labels, friends, minutes):
xy=(friend_count, minute_count),
xytext=(5, -5), # but slightly offset
textcoords='offset points')

plt.title("Daily Minutes vs. Number of Friends")
plt.xlabel("# of friends")
plt.ylabel("daily minutes spent on the site")

Thank you!


So you're actually creating the variables in the for loop:

for label, friend_count, minute_count in zip(labels, friends, minutes):

When you zip those together you're grouping them by the index, so the first item it iterates to is (70, 175, 'a'), the second is (65, 175, 'b'), and so on. Python then unpacks those three results, because you ask it to assign to three variables, label, friend_count and minute_count. If you were trying to unpack four variables and only supplied three names, for example, it would raise an error.

Then each time it iterates through the loop it reassigns the next values to those three variables.

Another way to think about this: if you wrote that line as:

for values in zip(labels, friends, minutes): 

then values would just be the three items together every time, and those variables would not exist. You could then unpack them within the loop if you wanted. The way you posted is just a neater way to do it.

One more example of unpacking that you can play with yourself:

x = [1, 2, 3, 4]
a, b, c, d = x

would assign a=1, b=2 and so on. However:

a, b = x

returns an error:

ValueError Traceback (most recent call last) in () ----> 1 a, b = x

ValueError: too many values to unpack (expected 2)

This gets more interesting using the * operator:

a, *b = x

results in:

In [38]: a
Out[38]: 'a'

In [39]: b
Out[39]: ['b', 'c']

That is, the * tells Python that the last value is the place to dump whatever is left. This behavior is against used a lot in functions, but can be used in for loops as well. Actually note that this * operator only works with lists, as I illustrated above, in Python 3.x. In 2.x you can still use it in functions this way, but not in assignment.