eos - 6 months ago 45

Python Question

I have a shapely *polygon* representing the boundaries of the city of Los Angeles. I also have a set of ~1 million lat-long *points* in a geopandas GeoDataFrame, all of which fall within that polygon's minimum bounding box. Some of these points lie within the polygon itself, but others do not. I want to retain only those points within Los Angeles's boundaries, and due to Los Angeles's irregular shape, only approximately 1/3 of the points within its minimum bounding box are within the polygon itself.

**Using Python, what is the fastest way to identify which of these points lie within the polygon, given that the points and the polygon have the same minimum bounding box?**

I tried using geopandas and its r-tree spatial index:

`sindex = gdf['geometry'].sindex`

possible_matches_index = list(sindex.intersection(polygon.bounds))

possible_matches = gdf.iloc[possible_matches_index]

points_in_polygon = possible_matches[possible_matches.intersects(polygon)]

This uses the GeoDataFrame's r-tree spatial index to quickly find the

I also tried dividing my polygon into small sub-polygons, then using the spatial index to find which points possibly intersect with each of these sub-polygons. This method successfully finds fewer possible matches because each of the sub-polygons' minimum bounding boxes is much smaller than the set of points minimum bounding box. However, intersecting this set of possible matches with my polygon still only shaves off ~25% of my computation time, so it's still a brutally slow process.

Is there a better spatial index method I should use? And what is the fastest way to find which points are within the polygon, if the points and polygon have the same minimum bounding box?

Answer

To summarize the problem: when the polygon's bounding box is the same as the set of points', r-tree identifies every point as a possible match, thus offering no speed up. When coupled with lots of points and a polygon with lots of vertices, the intersection process is *extremely* slow.

Solution: from this geopandas r-tree spatial index tutorial, use a quadrat routine to divide the polygon into sub-polygons. Then, for each sub-polygon, intersect it first with the points' r-tree index to get a small set of possible matches, then intersect those possible matches with the sub-polygon to get the set of precise matches. This offers speed-ups of approx 100x.