Jiajun Yang - 1 year ago 66

Python Question

I am not sure whether "norm" and "Euclidean distance" mean the same thing. Please could you help me with this distinction.

I have an

`n`

`m`

`a`

`m`

`a[1,:]`

`np.linalg.norm`

`import numpy as np`

a = np.array([[0, 0, 0 ,0 ], [1, 1 , 1, 1],[2,2, 2, 3], [3,5, 1, 5]])

N = a.shape[0] # number of row

pos = a[1,:] # pick out the second data point.

dist = np.zeros((N,1), dtype=np.float64)

for i in range(N):

dist[i]= np.linalg.norm(a[i,:] - pos)

Answer Source

A norm is a function that takes a vector as an input and returns a scalar value that can be interpreted as the "size" or "length" of that vector. Norms have some other important mathematical properties:

- They scale multiplicatively, i.e.
*Norm(a·*for any scalar factor**v**) = |a|·Norm(**v**)*a* - They satisfy the triangle inequality, i.e.
*Norm(***u**+**v**) ≤ Norm(**u**) + Norm(**v**) - The norm of the zero vector is always zero, i.e.
*Norm(***0**) = 0

The Euclidean norm (also known as the L² norm) is just one of many different norms - there is also the max norm, the Manhattan norm etc. The L² norm of a single vector is equivalent to the Euclidean distance from that point to the origin, and the L² norm of the difference between two vectors is equivalent to the Euclidean distance between the two points.

As **@nobar**'s answer says, `np.linalg.norm(x - y, ord=2)`

(or just `np.linalg.norm(x - y)`

) will give you Euclidean distance between the vectors `x`

and `y`

.

Since you want to compute the Euclidean distance between `a[1, :]`

and every other row in `a`

, you could do this a lot faster by eliminating the `for`

loop and broadcasting over the rows of `a`

:

```
dist = np.linalg.norm(a[1:2] - a, axis=1)
```

It's also easy to compute the Euclidean distance yourself using broadcasting:

```
dist = np.sqrt(((a[1:2] - a) ** 2).sum(1))
```

The fastest method is probably `scipy.spatial.distance.cdist`

:

```
from scipy.spatial.distance import cdist
dist = cdist(a[1:2], a)[0]
```

Some timings for a (1000, 1000) array:

```
a = np.random.randn(1000, 1000)
%timeit np.linalg.norm(a[1:2] - a, axis=1)
# 100 loops, best of 3: 5.43 ms per loop
%timeit np.sqrt(((a[1:2] - a) ** 2).sum(1))
# 100 loops, best of 3: 5.5 ms per loop
%timeit cdist(a[1:2], a)[0]
# 1000 loops, best of 3: 1.38 ms per loop
# check that all 3 methods return the same result
d1 = np.linalg.norm(a[1:2] - a, axis=1)
d2 = np.sqrt(((a[1:2] - a) ** 2).sum(1))
d3 = cdist(a[1:2], a)[0]
assert np.allclose(d1, d2) and np.allclose(d1, d3)
```