johnjax - 4 months ago 24

Python Question

I have written a Python script to calculate the distance between two points in 3D space while accounting for periodic boundary conditions. The problem is that I need to do this calculation for many, many points and the calculation is quite slow. Here is my function.

`def PBCdist(coord1,coord2,UC):`

dx = coord1[0] - coord2[0]

if (abs(dx) > UC[0]*0.5):

dx = UC[0] - dx

dy = coord1[1] - coord2[1]

if (abs(dy) > UC[1]*0.5):

dy = UC[1] - dy

dz = coord1[2] - coord2[2]

if (abs(dz) > UC[2]*0.5):

dz = UC[2] - dz

dist = np.sqrt(dx**2 + dy**2 + dz**2)

return dist

I then call the function as so

`for i, coord2 in enumerate(coordlist):`

if (PBCdist(coord1,coord2,UC) < radius):

do something with i

Recently I read that I can greatly increase performance by using list comprehension. The following works for the non-PBC case, but not for the PBC case

`coord_indices = [i for i, y in enumerate([np.sqrt(np.sum((coord2-coord1)**2)) for coord2 in coordlist]) if y < radius]`

for i in coord_indices:

do something

Is there some way to do the equivalent of this for the PBC case? Is there an alternative that would work better?

Answer

You should write your `distance()`

function in a way that you can vectorise the loop over the 5711 points. The following implementation accepts an array of points as either the `x0`

or `x1`

parameter:

```
def distance(x0, x1, dimensions):
delta = numpy.abs(x0 - x1)
delta = numpy.where(delta > 0.5 * dimensions, delta - dimensions, delta)
return numpy.sqrt((delta ** 2).sum(axis=-1))
```

Example:

```
>>> dimensions = numpy.array([3.0, 4.0, 5.0])
>>> points = numpy.array([[2.7, 1.5, 4.3], [1.2, 0.3, 4.2]])
>>> distance(points, [1.5, 2.0, 2.5], dimensions)
array([ 2.22036033, 2.42280829])
```

The result is the array of distances between the points passed as second parameter to `distance()`

and each point in `points`

.