eos eos - 1 month ago 13
Python Question

Spatial index to find points within polygon, if points and polygon have same minimum bounding box

I have a shapely polygon representing the boundaries of the city of Los Angeles. I also have a set of ~1 million lat-long points in a geopandas GeoDataFrame, all of which fall within that polygon's minimum bounding box. Some of these points lie within the polygon itself, but others do not. I want to retain only those points within Los Angeles's boundaries, and due to Los Angeles's irregular shape, only approximately 1/3 of the points within its minimum bounding box are within the polygon itself.

Using Python, what is the fastest way to identify which of these points lie within the polygon, given that the points and the polygon have the same minimum bounding box?

I tried using geopandas and its r-tree spatial index:

sindex = gdf['geometry'].sindex
possible_matches_index = list(sindex.intersection(polygon.bounds))
possible_matches = gdf.iloc[possible_matches_index]
points_in_polygon = possible_matches[possible_matches.intersects(polygon)]

This uses the GeoDataFrame's r-tree spatial index to quickly find the possible matches, and then finds the exact intersection of the polygon and those possible matches. However, because the polygon's minimum bounding box is the same as the set of points' minimum bounding box, r-tree considers every point to be a possible match. Thus, using an r-tree spatial index makes the intersection run no faster than it would without the spatial index. This method is very slow: it takes ~30 minutes to complete.

I also tried dividing my polygon into small sub-polygons, then using the spatial index to find which points possibly intersect with each of these sub-polygons. This method successfully finds fewer possible matches because each of the sub-polygons' minimum bounding boxes is much smaller than the set of points minimum bounding box. However, intersecting this set of possible matches with my polygon still only shaves off ~25% of my computation time, so it's still a brutally slow process.

Is there a better spatial index method I should use? And what is the fastest way to find which points are within the polygon, if the points and polygon have the same minimum bounding box?

eos eos

To summarize the problem: when the polygon's bounding box is the same as the set of points', r-tree identifies every point as a possible match, thus offering no speed up. When coupled with lots of points and a polygon with lots of vertices, the intersection process is extremely slow.

Solution: from this geopandas r-tree spatial index tutorial, use a quadrat routine to divide the polygon into sub-polygons. Then, for each sub-polygon, intersect it first with the points' r-tree index to get a small set of possible matches, then intersect those possible matches with the sub-polygon to get the set of precise matches. This offers speed-ups of approx 100x.