ShanZhengYang - 6 months ago 56

Python Question

I'm working with hundreds of pandas dataframes. A typical dataframe is as follows:

`import pandas as pd`

import numpy as np

data = 'filename.csv'

df = pd.DataFrame(data)

df

one two three four five

a 0.469112 -0.282863 -1.509059 bar True

b 0.932424 1.224234 7.823421 bar False

c -1.135632 1.212112 -0.173215 bar False

d 0.232424 2.342112 0.982342 unbar True

e 0.119209 -1.044236 -0.861849 bar True

f -2.104569 -0.494929 1.071804 bar False

....

There are certain operations whereby I'm dividing between columns values, e.g.

`df['one']/df['two']`

However, there are times where I am dividing by zero, or perhaps both

`df['one'] = 0`

df['two'] = 0

Naturally, this outputs the error:

`ZeroDivisionError: division by zero`

I would prefer for 0/0 to actually mean "there's nothing here", as this is often what such a zero means in a dataframe.

(a) How would I code this to mean "divide by zero" is 0 ?

(b) How would I code this to "pass" if divide by zero is encountered?

Answer

Two approaches to consider:

Prepare your data so that never has a divide by zero situation, by explicitly coding a "no data" value and testing for that.

Wrap each division that might result in an error with a `try`

/`except`

pair, as described at https://wiki.python.org/moin/HandlingExceptions (which has a divide by zero example to use)

```
(x,y) = (5,0)
try:
z = x/y
except ZeroDivisionError:
print "divide by zero"
```

I worry about the situation where your data includes a zero that's really a zero (and not a missing value).