mnky9800n mnky9800n - 1 year ago 133
Python Question

python pandas reindexing removes data at 0.0

I have three columns of data: two position values and one data value. I would like to pivot this data so that the elements of one column become the new columns and the elements of another one of the original columns become the indices. These data will be plotted using

expects the data to be structured such that it doesn't have to guess what to do. That is if there is a column of nans,
will not fill in this column correctly. So I have written some code to correctly shape the data so that it can be fed to

The problem I have is that the code seems to remove data around
x = 0.0
. I think this is occuring on the line where the dataframe is being reindexed to include the "missing" rows.

I've added a plot (and hence some extra code) to give a visual aide to the problem statement. The left plot shows the original data, the right plot shows the result after the data has been reshaped for

The code example I have provided should run in an ipython notebook by only copying and pasting.

Any suggestions are welcome. Perhaps this solution is super complicated? It sure feels that way.

enter image description here

%matplotlib inline

import decimal
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

test_df = pd.DataFrame()
test_df['x'] = [-2, -1.5, -0.9, -0.7, -0.5, 0.0, 0.5, 1.1]
test_df['y'] = [1,2,4,5,6,7,5,4]
test_df['v'] = np.random.randn(8)

def get_precision(number):
gives the precision, or decimal place, of the number
return int(abs(decimal.Decimal(str(number)).as_tuple().exponent))

def min_max(column):
column_min = np.floor(column.min())
column_max = np.ceil(column.max())
return column_min, column_max

def construct_df_for_pcolormesh(df, col, ix, values, columns_increment, index_increment):
columns_increment = 1.0/columns_increment
index_increment = 1.0/index_increment

columns_precision = get_precision(columns_increment)
index_precision = get_precision(index_increment)

columns_min, columns_max = min_max(df[col])
index_min, index_max = min_max(df[ix])

columns = np.linspace(columns_min, columns_max, (columns_max - columns_min)*columns_increment + 1)
index = np.linspace(index_min, index_max, (index_max - index_min)*index_increment + 1)

new_index = [(round(c, columns_precision), round(i, index_precision)) for c in columns for i in index]

df_for_pcolormesh = df.set_index([col, ix]).reindex(new_index).reset_index()
df_for_pcolormesh = df_for_pcolormesh.pivot(index=ix, columns=col, values=values)
return df_for_pcolormesh

fig, (ax,ax1)= plt.subplots(1,2, sharey=True, sharex=True)

test_df.plot(kind='scatter', x='x', y='y', s=100, grid=True, ax=ax)
ax.set_xlim(-2.5, 1.5)
ax.set_title('Plot with all the data')

data_df = construct_df_for_pcolormesh(test_df, 'x', 'y', 'v', 0.1, 0.1)

depths = data_df.index
xx = data_df.columns

d, x = np.meshgrid(depths, xx)
data =

ax1.pcolormesh(x, d, data.transpose(), cmap='viridis')
ax1.set_xlim(-2.5, 1.5)
ax1.set_title('Plot with missing\ndatapoint at x=0.0')

Answer Source

I am not sure about the real reason. However, I changed your min_max function to:

def min_max(column):
    column_min = np.floor(column.min())
    column_max = np.ceil(column.max()) + 1
    return column_min, column_max

And then it worked:

enter image description here

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download