Dryden Dryden - 1 year ago 206
Python Question

Import mssql spatial fields into geopandas/shapely geometry

I cannot seem to be able to directly import mssql spatial fields into geopandas. I can import normal mssql tables into pandas with Pymssql without problems, but I cannot figure out a way to import the spatial fields into shapely geometry. I know that the OGR driver for mssql should be able to handle it, but I'm not skilled enough in sql to figure this out.
This is more of an issue for lines and polygons as points can be converted to x and y coordinates from the mssql field.

Answer Source

I figured it out by properly querying the sql database table and converting the wkt string to shapely geometry via the loads function in shapely.wkt.

I'm no programmer, so bear that in mind with the organization of the function. The function can import mssql tables with or without GIS geometry.

def rd_sql(server, database, table, col_names=None, where_col=None, where_val=None, geo_col=False, epsg=2193, export=False, path='save.csv'):
Function to import data from a MSSQL database. Specific columns can be selected and specific queries within columns can be selected. Requires the pymssql package, which must be separately installed.
server -- The server name (str). e.g.: 'SQL2012PROD03'\n
database -- The specific database within the server (str). e.g.: 'LowFlows'\n
table -- The specific table within the database (str). e.g.: 'LowFlowSiteRestrictionDaily'\n
col_names -- The column names that should be retrieved (list). e.g.: ['SiteID', 'BandNo', 'RecordNo']\n
where_col -- The sql statement related to a specific column for selection (must be formated according to the example). e.g.: 'SnapshotType'\n
where_val -- The WHERE query values for the where_col (list). e.g. ['value1', 'value2']\n
geo_col -- Is there a geometry column in the table?.\n
epsg -- The coordinate system (int).\n
export -- Should the data be exported?\n
path -- The path and csv name for the export if 'export' is True (str).
from pymssql import connect
from pandas import read_sql
from shapely.wkt import loads
from geopandas import GeoDataFrame

if col_names is None and where_col is None:
    stmt1 = 'SELECT * FROM ' + table
elif where_col is None:
    stmt1 = 'SELECT ' + str(col_names).replace('\'', '"')[1:-1] + ' FROM ' + table
    stmt1 = 'SELECT ' + str(col_names).replace('\'', '"')[1:-1] + ' FROM ' + table + ' WHERE ' + str([where_col]).replace('\'', '"')[1:-1] + ' IN (' + str(where_val)[1:-1] + ')'
conn = connect(server, database=database)
df = read_sql(stmt1, conn)

## Read in geometry if required
if geo_col:
    geo_col_stmt = "SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME=" + "\'" + table + "\'" + " AND DATA_TYPE='geometry'"
    geo_col = str(read_sql(geo_col_stmt, conn).iloc[0,0])
    if where_col is None:
        stmt2 = 'SELECT ' + geo_col + '.STGeometryN(1).ToString()' + ' FROM ' + table
        stmt2 = 'SELECT ' + geo_col + '.STGeometryN(1).ToString()' + ' FROM ' + table + ' WHERE ' + str([where_col]).replace('\'', '"')[1:-1] + ' IN (' + str(where_val)[1:-1] + ')'
    df2 = read_sql(stmt2, conn)
    df2.columns = ['geometry']
    geometry = [loads(x) for x in df2.geometry]
    df = GeoDataFrame(df, geometry=geometry, crs={'init' :'epsg:' + str(epsg)})

if export:
    df.to_csv(path, index=False)


EDIT: Made the function automatically find the geometry field if one exists.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download