eos - 1 year ago 94
Python Question

# Spatial index to find points within polygon, if points and polygon have same minimum bounding box

I have a shapely polygon representing the boundaries of the city of Los Angeles. I also have a set of ~1 million lat-long points in a geopandas GeoDataFrame, all of which fall within that polygon's minimum bounding box. Some of these points lie within the polygon itself, but others do not. I want to retain only those points within Los Angeles's boundaries, and due to Los Angeles's irregular shape, only approximately 1/3 of the points within its minimum bounding box are within the polygon itself.

Using Python, what is the fastest way to identify which of these points lie within the polygon, given that the points and the polygon have the same minimum bounding box?

I tried using geopandas and its r-tree spatial index:

``````sindex = gdf['geometry'].sindex
possible_matches_index = list(sindex.intersection(polygon.bounds))
possible_matches = gdf.iloc[possible_matches_index]
points_in_polygon = possible_matches[possible_matches.intersects(polygon)]
``````

This uses the GeoDataFrame's r-tree spatial index to quickly find the possible matches, and then finds the exact intersection of the polygon and those possible matches. However, because the polygon's minimum bounding box is the same as the set of points' minimum bounding box, r-tree considers every point to be a possible match. Thus, using an r-tree spatial index makes the intersection run no faster than it would without the spatial index. This method is very slow: it takes ~30 minutes to complete.

I also tried dividing my polygon into small sub-polygons, then using the spatial index to find which points possibly intersect with each of these sub-polygons. This method successfully finds fewer possible matches because each of the sub-polygons' minimum bounding boxes is much smaller than the set of points minimum bounding box. However, intersecting this set of possible matches with my polygon still only shaves off ~25% of my computation time, so it's still a brutally slow process.

Is there a better spatial index method I should use? And what is the fastest way to find which points are within the polygon, if the points and polygon have the same minimum bounding box?