dirtysocks45 dirtysocks45 - 4 days ago 4
Python Question

How do I convert matrices from Matlab to Python?

I have the following code in Matlab which I'm not familiar with:

function segments = segmentEnergy(data, th)
mag = sqrt(sum(data(:, 1:3) .^ 2, 2));
mag = mag - mean(mag);

above = find(mag>=th*std(mag));
indicator = zeros(size(mag));
indicator(above) = 1;
plot(mag); hold on; plot(indicator*1000, 'r')
end


I wrote this following function in Python:

def segment_energy(data, th):
mag = np.linalg.norm((data['x'], data['y'], data['z']))
print "This is the mag: " + str(mag)
mag -= np.mean(mag)

above = np.where(mag >= th * np.std(mag))
indicator = np.zeros(mag.shape)
indicator[above] = 1
plt.plot(mag)
plt.plot(indicator * 1000, 'r')
plt.show()


I get an error:

line 23, in segment_energy
indicator[above] = 1
IndexError: too many indices for array


I understand it's because
mag
in the Python code is a scalar and I'm treating it like a matrix. However, I'm not sure how they're turning
mag
into a matrix in the Matlab function.

Answer

The output of numpy.linalg.norm by default would give you a single scalar value given how you are currently calling the function. Because the output of mag is now a scalar, the rest of the code will not function as intended for the following reasons:

  1. Performing mean subtraction with a single scalar will give you a value of 0 (i.e. mag <- mag - np.mean(mag) --> 0).

  2. The above statement will always return a tuple of a single element. This element contains a NumPy array of length 1 containing the index 0, symbolizing that the first element of the "array" which is a scalar in this case satisfies the constraint. This is satisfied always as the standard deviation of a single constant is also 0 by using the default definition of np.std.

  3. Calling shape for a single scalar value is undefined and it will actually give you an empty shape: (). Note that if you did not subtract with numpy.mean, doing mag.shape would actually give you an error as it is not a NumPy array. Subtracting with np.mean coalesces the scalar to a NumPy array.

    Observe:

    In [56]: mag = 10
    
    In [57]: type(mag)
    Out[57]: int
    
    In [58]: mag -= np.mean(mag)
    
    In [59]: type(mag)
    Out[59]: numpy.float64
    
  4. Finally, calling the indicator creation code will produce an array of empty dimensions and since you are trying to index into an array that has no size, it will give you an error.

Observe this reproducible error assuming that mag was calculated to be some value... say... 10 and th = 1:

In [60]: mag = 10

In [61]: mag -= np.mean(mag)

In [62]: mag.shape
Out[62]: ()

In [63]: th = 1

In [64]: above = np.where(mag >= th * np.std(mag))

In [65]: indicator = np.zeros(mag.shape)

In [66]: indicator
Out[66]: array(0.0)

In [67]: mag
Out[67]: 0.0

In [68]: indicator[above] = 1
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-67-adf9cff7610a> in <module>()
----> 1 indicator[above] = 1

IndexError: too many indices for array

Therefore, the solution for you is to rethink how you are writing this function. The MATLAB code assumes that data is a 2D matrix already, so they're computing the norm or length of each row independently. Because we now know that the input is a pandas DataFrame, we can very easily apply numpy operations on it just like what is done in MATLAB. Assuming that your columns are labelled x, y and z in your code and each column is a numpy array of values, just change the first line of code.

def segment_energy(data, th):
    mag = np.sqrt(np.sum(data.loc[:, ['x','y','z']]** 2.0, axis=1)) # Change
    mag = np.array(mag) # Convert to NumPy array
    mag -= np.mean(mag)

    above = np.where(mag >= th * np.std(mag))
    indicator = np.zeros(mag.shape)
    indicator[above] = 1
    plt.plot(mag)
    plt.plot(indicator * 1000, 'r')
    plt.show()

The first statement in the code is the actual NumPy translation of the code in MATLAB. We use the loc method that's part of the pandas dataframe to index the three columns you are looking for. We also need to convert to a NumPy array for the rest of the calculations to work.

You can also use numpy.linalg.norm, but specify an axis which to operate on. Assuming that the data is now 2D as earlier, specify axis=1 to compute the row-wise norms of your matrix:

mag = np.linalg.norm(data.loc[:, ['x', 'y', 'z']], axis=1)

The above will coalesce the data into a NumPy array for you.