Akash Kava Akash Kava - 1 month ago 4x
SQL Question

What is best way to store huge Cities for auto complete query?

I have list of cities, around 2712406 records. Each has state code and user first selects state code.

Both StateCode and Name are already indexed.

My current query to Database hits in the form of


Which internally translate to

return db.Cities.Where(x=>x.StateCode == stateCode && x.Name.StartsWith(name))
.OrderBy( x=> x.Name )

This query times out sometime when database is busy for little bigger actions.

I have thought of two ways,

  1. Storing pre configured queries on amazon S3.

  2. Combining StateCode and Name as one Column and do SearchName.StartWith(stateCode +"/" + name)

Storing S3 Objects

I calculated total number of combinations that user need is 10 million. So assuming 5KB of JSON for only retrieving first 10 cities. It needs 50GB of storage. Roughly coming around $5 per month. Which is good but if I store my json as
/stateCode/a /stateCode/b
etc will serve faster to user with CDN?

Combining State and City Key

Let's say I have a separate column combined and index as "StateCode/Name".
Here I know that query will hit B+ Tree of index. Will lookup on one column index will have greater benefit or will it be almost same as current query and I am better off with S3. I don't know whether S3 will give same performance as DB, all I can do is setup multiple readonly DB in my each app server.


It turned out that combined TEXT INDEX worked well. Joins are expensive operation.

  1. Create Column CityPath = StateCode + '/' + CityName
  2. Index CityPath

Query performs better then AND of two different result sets.

@CityPath = StateCode + '/' + CharactersTyped + '%'
SELECT TOP 10 * FROM Cities WHERE CityPath LIKE @CityPath

In the above query B+ index has to search limited sub-tree as StateCode is itself the root. However in the following query, B+ index has to search larger sub-tree and each result has to be compared for StateCode equivalency. The resultset of B+ index scan for CityName search is larger in this second query, which leads to microsecond timeouts.

@StateCode = StateCode
@CityName = CharactersTyped    
SELECT TOP 10 * FROM Cities WHERE CityName LIKE @CityName
     AND StateCode = @StateCode

Computed Column Indexes can improve searching of data if data exists in hierarchy as shown above.