eos - 1 year ago 87

Python Question

I have a shapely *polygon* representing the boundaries of the city of Los Angeles. I also have a set of ~1 million lat-long *points* in a geopandas GeoDataFrame, all of which fall within that polygon's minimum bounding box. Some of these points lie within the polygon itself, but others do not. I want to retain only those points within Los Angeles's boundaries, and due to Los Angeles's irregular shape, only approximately 1/3 of the points within its minimum bounding box are within the polygon itself.

**Using Python, what is the fastest way to identify which of these points lie within the polygon, given that the points and the polygon have the same minimum bounding box?**

I tried using geopandas and its r-tree spatial index:

`sindex = gdf['geometry'].sindex`

possible_matches_index = list(sindex.intersection(polygon.bounds))

possible_matches = gdf.iloc[possible_matches_index]

points_in_polygon = possible_matches[possible_matches.intersects(polygon)]

This uses the GeoDataFrame's r-tree spatial index to quickly find the

I also tried dividing my polygon into small sub-polygons, then using the spatial index to find which points possibly intersect with each of these sub-polygons. This method successfully finds fewer possible matches because each of the sub-polygons' minimum bounding boxes is much smaller than the set of points minimum bounding box. However, intersecting this set of possible matches with my polygon still only shaves off ~25% of my computation time, so it's still a brutally slow process.

Is there a better spatial index method I should use? And what is the fastest way to find which points are within the polygon, if the points and polygon have the same minimum bounding box?

Recommended for you: Get network issues from **WhatsUp Gold**. **Not end users.**

Answer Source

To summarize the problem: when the polygon's bounding box is the same as the set of points', r-tree identifies every point as a possible match, thus offering no speed up. When coupled with lots of points and a polygon with lots of vertices, the intersection process is *extremely* slow.

Solution: from this geopandas r-tree spatial index tutorial, use a quadrat routine to divide the polygon into sub-polygons. Then, for each sub-polygon, intersect it first with the points' r-tree index to get a small set of possible matches, then intersect those possible matches with the sub-polygon to get the set of precise matches. This offers speed-ups of approx 100x.

Recommended from our users: **Dynamic Network Monitoring from WhatsUp Gold from IPSwitch**. ** Free Download**