dirtysocks45 - 9 months ago 36
Python Question

# How do I convert matrices from Matlab to Python?

I have the following code in Matlab which I'm not familiar with:

``````function segments = segmentEnergy(data, th)
mag = sqrt(sum(data(:, 1:3) .^ 2, 2));
mag = mag - mean(mag);

above = find(mag>=th*std(mag));
indicator = zeros(size(mag));
indicator(above) = 1;
plot(mag); hold on; plot(indicator*1000, 'r')
end
``````

I wrote this following function in Python:

``````def segment_energy(data, th):
mag = np.linalg.norm((data['x'], data['y'], data['z']))
print "This is the mag: " + str(mag)
mag -= np.mean(mag)

above = np.where(mag >= th * np.std(mag))
indicator = np.zeros(mag.shape)
indicator[above] = 1
plt.plot(mag)
plt.plot(indicator * 1000, 'r')
plt.show()
``````

I get an error:

``````line 23, in segment_energy
indicator[above] = 1
IndexError: too many indices for array
``````

I understand it's because
`mag`
in the Python code is a scalar and I'm treating it like a matrix. However, I'm not sure how they're turning
`mag`
into a matrix in the Matlab function.

The output of `numpy.linalg.norm` by default would give you a single scalar value given how you are currently calling the function. Because the output of `mag` is now a scalar, the rest of the code will not function as intended for the following reasons:

1. Performing mean subtraction with a single scalar will give you a value of 0 (i.e. `mag <- mag - np.mean(mag) --> 0`).

2. The `above` statement will always return a tuple of a single element. This element contains a NumPy array of length 1 containing the index 0, symbolizing that the first element of the "array" which is a scalar in this case satisfies the constraint. This is satisfied always as the standard deviation of a single constant is also 0 by using the default definition of `np.std`.

3. Calling `shape` for a single scalar value is undefined and it will actually give you an empty shape: `()`. Note that if you did not subtract with `numpy.mean`, doing `mag.shape` would actually give you an error as it is not a NumPy array. Subtracting with `np.mean` coalesces the scalar to a NumPy array.

Observe:

``````In [56]: mag = 10

In [57]: type(mag)
Out[57]: int

In [58]: mag -= np.mean(mag)

In [59]: type(mag)
Out[59]: numpy.float64
``````
4. Finally, calling the `indicator` creation code will produce an array of empty dimensions and since you are trying to index into an array that has no size, it will give you an error.

Observe this reproducible error assuming that `mag` was calculated to be some value... say... 10 and `th = 1`:

``````In [60]: mag = 10

In [61]: mag -= np.mean(mag)

In [62]: mag.shape
Out[62]: ()

In [63]: th = 1

In [64]: above = np.where(mag >= th * np.std(mag))

In [65]: indicator = np.zeros(mag.shape)

In [66]: indicator
Out[66]: array(0.0)

In [67]: mag
Out[67]: 0.0

In [68]: indicator[above] = 1
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
----> 1 indicator[above] = 1

IndexError: too many indices for array
``````

Therefore, the solution for you is to rethink how you are writing this function. The MATLAB code assumes that `data` is a 2D matrix already, so they're computing the norm or length of each row independently. Because we now know that the input is a `pandas` `DataFrame`, we can very easily apply `numpy` operations on it just like what is done in MATLAB. Assuming that your columns are labelled `x`, `y` and `z` in your code and each column is a `numpy` array of values, just change the first line of code.

``````def segment_energy(data, th):
mag = np.sqrt(np.sum(data.loc[:, ['x','y','z']]** 2.0, axis=1)) # Change
mag = np.array(mag) # Convert to NumPy array
mag -= np.mean(mag)

above = np.where(mag >= th * np.std(mag))
indicator = np.zeros(mag.shape)
indicator[above] = 1
plt.plot(mag)
plt.plot(indicator * 1000, 'r')
plt.show()
``````

The first statement in the code is the actual NumPy translation of the code in MATLAB. We use the `loc` method that's part of the `pandas` dataframe to index the three columns you are looking for. We also need to convert to a NumPy array for the rest of the calculations to work.

You can also use `numpy.linalg.norm`, but specify an axis which to operate on. Assuming that the data is now 2D as earlier, specify `axis=1` to compute the row-wise norms of your matrix:

``````mag = np.linalg.norm(data.loc[:, ['x', 'y', 'z']], axis=1)
``````

The above will coalesce the data into a NumPy array for you.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download