Vijay_Shinde - 1 year ago 160
Java Question

# Calculate Linear regression on data set in Map Reduce

Say I have a input as follows:

``````60,3.1

61,3.6

62,3.8

63,4

65,4.1
``````

Ouput is expected as follows:

Expected output: y = -8.098 + 0.19x.

I know how to do this in java. But don't know how this work with mapreduce model. Can any one give idea or sample Map Reduce code on this problem? I will appreciate this.

This simple mathematical example:

``````Regression Formula:
Regression Equation(y) = a + bx
Slope(b) = (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2)
Intercept(a) = (ΣY - b(ΣX)) / N

where
x and y are the variables.
b = The slope of the regression line
a = The intercept point of the regression line and the y axis.
N = Number of values or elements
X = First Score
Y = Second Score
ΣXY = Sum of the product of first and Second Scores
ΣX = Sum of First Scores
ΣY = Sum of Second Scores
ΣX2 = Sum of square First Scores
``````

e.g.

``````X Values   Y Values
60          3.1
61          3.6
62          3.8
63            4
65          4.1
``````

To find regression equation, we will first find slope, intercept and use it to form regression equation..

``````Step 1: Count the number of values.
N = 5

Step 2: Find XY, X2
See the below table

X Value   Y Value          X*Y             X*X
60        3.1     60 * 3.1 = 186     60 * 60 = 3600
61        3.6     61 * 3.6 = 219.6   61 * 61 = 3721
62        3.8     62 * 3.8 = 235.6   62 * 62 = 3844
63          4     63 * 4 = 252       63 * 63 = 3969
65        4.1     65 * 4.1 = 266.5   65 * 65 = 4225

Step 3: Find ΣX, ΣY, ΣXY, ΣX2.
ΣX = 311
ΣY = 18.6
ΣXY = 1159.7
ΣX2 = 19359

Step 4: Substitute in the above slope formula given.
Slope(b) = (NΣXY - (ΣX)(ΣY)) / (NΣX2 - (ΣX)2)
= ((5)*(1159.7)-(311)*(18.6))/((5)*(19359)-(311)2)
= (5798.5 - 5784.6)/(96795 - 96721)
= 13.9/74
= 0.19

Step 5: Now, again substitute in the above intercept formula given.
Intercept(a) = (ΣY - b(ΣX)) / N
= (18.6 - 0.19(311))/5
= (18.6 - 59.09)/5
= -40.49/5
= -8.098

Step 6: Then substitute these values in regression equation formula
Regression Equation(y) = a + bx
= -8.098 + 0.19x.
``````

Suppose if we want to know the approximate y value for the variable x = 64. Then we can substitute the value in the above equation.

``````    Regression Equation(y) = a + bx
= -8.098 + 0.19(64).
= -8.098 + 12.16
= 4.06
``````