Poh Zi How - 2 years ago 125
Python Question

# Pandas way to find discontinuous data

I will like to find out which columns in a pandas dataframe have discontinuous data. By "discontinuous" I mean that the values turn from some value to zero, before having some value again.

[0,0,0,1,2,3,4,5,0,0,0] # continuous
[0,0,0,1,2,0,4,5,0,0,0] # not continuous

I have managed to implement some code that can do this, using a for loop to iterate through every column of the dataframe. I made a working snippet below to show what I mean:

import numpy as np
import pandas as pd

def find_discontinuous(series):
switch = 0
for index,val in series.iteritems():
# print(val, end=" ")
if switch==0 and val==0:
# print("still zero")
continue
elif switch==0 and val!=0:
switch = 1
if switch==1 and val==0:
# print("back to zero")
switch = 2
continue
if switch==2 and val!=0:
# print("supposed to be zero")
return "not continuous"
return "continuous"

data = np.array([[0,1,2,3,4,5,0],
[0,1,2,0,4,5,0]])
df = pd.DataFrame(data,columns=list(range(7)),index=list(range(2))).transpose()

for column in df.columns:
series = df.loc[:,column]
res = find_discontinuous(series)
print(column,res)

Output:

0 continuous
1 not continuous

I read somewhere that it is probably not correct to use a for loop to iterate through a pandas dataframe as it is slow. What will be the pandas way to achieve the same thing?

Answer Source

You just need to check that between the first change away from zero and the last change to zero, there is no zero in between:

def is_continuous(series):
id_first_true = (series > 0).idxmax()
id_last_true = (series > 0)[::-1].idxmax()
return all((series>0).loc[id_first_true:id_last_true] == True)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download