zelenov aleksey - 10 months ago 42

Python Question

i have a matrix in pandas data frame

`print dfMatrix`

0 1 2 3 4

0 10000 10 8 11 10

1 10 100000 13 9 10

2 8 13 10000 9 11

3 11 9 9 10000 12

4 10 10 11 12 100000

I need to change row values by reducing each row value by minimum from that row(row by row)

here is the code i try:

`def matrixReduction(matrix):`

minRowValues = matrix.min(axis=1)

for i in xrange(matrix.shape[1]):

matrix[i][:] = matrix[i][:] - minRowValues[i]

return matrix

and expect output like:

`0 1 2 3 4`

0 9992 2 0 3 2

1 1 99991 4 0 1

2 0 5 9992 1 3

3 2 0 0 9991 3

4 0 0 1 2 99990

but i get such output:

`0 1 2 3 4`

0 9992 1 0 2 0

1 2 99991 5 0 0

2 0 4 9992 0 1

3 3 0 1 9991 2

4 2 1 3 3 99990

So it changes values in columns instead of rows,

How do i achieve it for rows?

thx

Answer Source

You can subtract by `sub`

minimal values per rows by `min`

:

```
print (df.min(axis=1))
0 8
1 9
2 8
3 9
4 10
dtype: int64
print (df.sub(df.min(axis=1), axis=0))
0 1 2 3 4
0 9992 2 0 3 2
1 1 99991 4 0 1
2 0 5 9992 1 3
3 2 0 0 9991 3
4 0 0 1 2 99990
```

I try also rewrite your function - I add `ix`

for selecting:

```
def matrixReduction(matrix):
minRowValues = matrix.min(axis=1)
for i in range(matrix.shape[1]):
matrix.ix[i,:] = matrix.ix[i, :] - minRowValues[i]
return matrix
```

**Timings**:

```
In [136]: %timeit (matrixReduction(df))
100 loops, best of 3: 2.64 ms per loop
In [137]: %timeit (df.sub(df.min(axis=1), axis=0))
The slowest run took 5.49 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 308 µs per loop
```