johnbaltis johnbaltis - 26 days ago 17
Python Question

pandas, correctly handle numpy arrays inside a row element

I'll give a minimal example where I would create numpy arrays inside row elements of a

pandas.DataFrame
.

TL;DR: see the screenshot of the
DataFrame


This code finds the minimum of a certain function, by using
scipy.optimize.brute
, which returns the minimum, variable at which the minimum is found and two numpy arrays at which it evaluated the function.

import numpy
import scipy.optimize
import itertools

sin = lambda r, phi, x: r * np.sin(phi * x)

def func(r, x):
x0, fval, grid, Jout = scipy.optimize.brute(
sin, ranges=[(-np.pi, np.pi)], args=(r, x), Ns=10, full_output=True)
return dict(phi_at_min=x0[0], result_min=fval, phis=grid, result_at_grid=Jout)


rs = numpy.linspace(-1, 1, 10)
xs = numpy.linspace(0, 1, 10)

vals = list(itertools.product(rs, xs))

result = [func(r, x) for r, x in vals]

# idk whether this is the best way of generating the DataFrame, but it works
df = pd.DataFrame(vals, columns=['r', 'x'])
df = pd.concat((pd.DataFrame(result), df), axis=1)
df.head()


dataframe

I expect that this is not how I am supposed to do this and should maybe expand the lists somehow. How do I handle this in a correct, beautiful, and clean way?

Answer

So, even though "beautiful and clean" is subject to interpretation, I'll give you mine, which should give you in turn some ideas. I'm leveraging a multiindex so that you can later easily select pairs of phi/result_at_grid for each point in the evaluation grid. I'm also using applyinstead of creating two dataframes.

import numpy
import scipy.optimize
import itertools

sin = lambda r, phi, x: r * np.sin(phi * x)

def func(row):
    """
    Accepts a row of a dataframe (a pd.Series).
    df.apply(func, axis=1)
    returns a pd.Series with the initial (r,x) and the results
    """
    r = row['r']
    x = row['x']
    x0, fval, grid, Jout = scipy.optimize.brute(
        sin, ranges=[(-np.pi, np.pi)], args=(r, x), Ns=10, full_output=True)

    # Create a multi index series for the phis
    phis = pd.Series(grid)
    phis.index = pd.MultiIndex.from_product([['Phis'], phis.index])

    # same for result at grid
    result_at_grid = pd.Series(Jout)
    result_at_grid.index = pd.MultiIndex.from_product([['result_at_grid'], result_at_grid.index])

    # concat
    s = pd.concat([phis, result_at_grid])

    # Add these two float results
    s['phi_at_min'] = x0[0]
    s['result_min'] = fval

    # add the initial r,x to reconstruct the index later
    s['r'] = r
    s['x'] = x

    return s



rs = numpy.linspace(-1, 1, 10)
xs = numpy.linspace(0, 1, 10)

vals = list(itertools.product(rs, xs))
df = pd.DataFrame(vals, columns=['r', 'x'])

# Apply func to each row (axis=1)
results = df.apply(func, axis=1)
results.set_index(['r','x'], inplace=True)
results.head().T # Transposing so we can see the output in one go...

enter image description here

Now you can select all values at the evaluation grid point 2 for example

print(results.swaplevel(0,1, axis=1)[2].head()) # Showing only 5 first


                   Phis  result_at_grid
r    x                                 
-1.0 0.000000 -1.745329        0.000000
     0.111111 -1.745329        0.193527
     0.222222 -1.745329        0.384667
     0.333333 -1.745329        0.571062
     0.444444 -1.745329        0.750415
Comments