Tom Tom - 6 months ago 11
Python Question

How can I calculate values specific to an ID provided on each line?

I have a data table of values that look like this:

Score ID
1.2 1
2.4 1
1.1 1
1.9 1
2.4 2
3.5 2
2.2 2
1.1 3
1.1 3
1.7 3
3.1 3
2.2 3


And what I really need is to calculate the Change In Score for each ID in a separate column, such that it looks like this:

Score ID Changes
1.2 1
2.4 1 1.2
1.1 1 -1.3
1.9 1 0.8
2.4 2
3.5 2 1.1
2.2 2 -1.3
1.1 3
1.1 3 0
1.7 3 0.6
3.1 3 1.4
2.2 3 -0.9


Notice how each "changes" is calculated by taking the previous score, and subtracting it from the current score: e.g. in the first changes, you get 1.2 by taking 1.2 = 2.4(current score) - 1.2(The previous score)

The problem is that these scores must only pertain to only the scores within the confines of the ID. You can't just iterate over and take the difference of scores. How would I do this? What logic can I use to test to contain "changes" within the same ID? I'm familiar with R or python (or BASH) to do this.

Answer

In Python using Pandas:

import pandas as pd

df = pd.DataFrame(
        {'ID': [1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3],
         'Score': [1.2, 2.4, 1.1, 1.9, 2.4, 3.5, 2.2, 1.1, 1.1, 1.7, 3.1, 2.2]})

df['Changes'] = df.groupby('ID').Score.transform(lambda group: group.diff())

>>> df
    Score  ID  Changes
0     1.2   1      NaN
1     2.4   1      1.2
2     1.1   1     -1.3
3     1.9   1      0.8
4     2.4   2      NaN
5     3.5   2      1.1
6     2.2   2     -1.3
7     1.1   3      NaN
8     1.1   3      0.0
9     1.7   3      0.6
10    3.1   3      1.4
11    2.2   3     -0.9
Comments