zelenov aleksey zelenov aleksey - 1 month ago 9
Python Question

change rows in pandas

i have a matrix in pandas data frame

print dfMatrix
0 1 2 3 4
0 10000 10 8 11 10
1 10 100000 13 9 10
2 8 13 10000 9 11
3 11 9 9 10000 12
4 10 10 11 12 100000


I need to change row values by reducing each row value by minimum from that row(row by row)
here is the code i try:

def matrixReduction(matrix):
minRowValues = matrix.min(axis=1)
for i in xrange(matrix.shape[1]):
matrix[i][:] = matrix[i][:] - minRowValues[i]
return matrix


and expect output like:

0 1 2 3 4
0 9992 2 0 3 2
1 1 99991 4 0 1
2 0 5 9992 1 3
3 2 0 0 9991 3
4 0 0 1 2 99990


but i get such output:

0 1 2 3 4
0 9992 1 0 2 0
1 2 99991 5 0 0
2 0 4 9992 0 1
3 3 0 1 9991 2
4 2 1 3 3 99990


So it changes values in columns instead of rows,
How do i achieve it for rows?
thx

Answer

You can subtract by sub minimal values per rows by min:

print (df.min(axis=1))
0     8
1     9
2     8
3     9
4    10
dtype: int64

print (df.sub(df.min(axis=1), axis=0))
      0      1     2     3      4
0  9992      2     0     3      2
1     1  99991     4     0      1
2     0      5  9992     1      3
3     2      0     0  9991      3
4     0      0     1     2  99990

I try also rewrite your function - I add ix for selecting:

def matrixReduction(matrix):
    minRowValues = matrix.min(axis=1)
    for i in range(matrix.shape[1]):
        matrix.ix[i,:] = matrix.ix[i, :] - minRowValues[i]
    return matrix

Timings:

In [136]: %timeit (matrixReduction(df))
100 loops, best of 3: 2.64 ms per loop

In [137]: %timeit (df.sub(df.min(axis=1), axis=0))
The slowest run took 5.49 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 308 ┬Ás per loop
Comments