khpeek - 7 months ago 56

Python Question

I'm analyzing a publicly available dataset: an assessment of properties in San Francisco for tax purposes (https://data.sfgov.org/Housing-and-Buildings/Historic-Secured-Property-Tax-Rolls/wv5m-vpq2). It can be downloaded as a CSV file, which assumes the filename 'Historic_Secured_Property_Tax_Rolls.csv'.

Using this file, I'm trying to figure out the annual growth rate of the Land Values, excluding zero values. The dataset is so large that I get errors if I try to plot it, so I'm firstly trying to rely on my understanding of how

`polyfit`

I've used the following code to derive a linear fit of the natural logarithm of the 'Land Value' column plotted against the 'Fiscal Year' column:

`import pandas as pd`

# Read in data downloaded from https://data.sfgov.org/api/views/wv5m-vpq2/rows.csv?accessType=DOWNLOAD

df = pd.read_csv('Historic_Secured_Property_Tax_Rolls.csv')

df_nz = df[df['Closed Roll Assessed Land Value'] > 0] # Only consider non-zero Land Values

p = np.polyfit(df_nz['Closed Roll Fiscal Year'], np.log(df_nz['Closed Roll Assessed Land Value']), 1)

This yields the following values for

`p`

`In [42]: p`

Out[42]: array([ 4.18802559e-02, -7.23804441e+01])

As I understand it, the slope of the linear fit should be represented by

`p[1]`

`p[0]`

I'm wondering if I haven't somehow misinterpreted the result, and whether the growth rate is somehow represented by

`p[0]`

`p[1]`

Answer

```
Returns
-------
p : ndarray, shape (M,) or (M, K)
Polynomial coefficients, highest power first. If `y` was 2-D, the
coefficients for `k`-th data set are in ``p[:,k]``.
```

This tells me that the `4.2%`

is the coefficient on the log term.

More to come...