euge1220 - 1 year ago 119
SQL Question

# Finding Covariance using SQL

``````# dt---------indx_nm1-----indx_val1-------indx_nm2------indx_val2
2009-06-08----ABQI------1001.2------------ACNACTR----------300.05
2009-06-09----ABQI------1002.12 ----------ACNACTR----------341.19
2009-06-10----ABQI------1011.4------------ACNACTR----------382.93
2009-06-11----ABQI------1015.43 ----------ACNACTR----------362.63
``````

I have a table that looks like ^ (but with hundreds of rows that dates from 2009 to 2013). Is there a way that I could calculate the covariance : [(
`indx_val1`
- avg(
`indx_val1`
)) * (
`indx_val2`
- avg(
`indx_val2`
)] divided by total number of rows for each value of
`indx_val1`
and
`indx_val2`
(loop through the entire table) and return just a simple value for cov(
`ABQI`
,
`ACNACTR`
)

Since you have aggregates operating over two different groups, you will need two different queries. The main one groups by `dt` to get your row values per date. The other query has to perform `AVG()` and `COUNT()` aggregates across the whole rowset.

To use them both at the same time, you need to `JOIN` them together. But since there's no actual relation between the two queries, it is a cartesian join on the expression `1 = 1` in the `ON`. Effectively, that joins every row of the main query with the single row retrieved by the aggregate query. You can then perform the arithmetic in the `SELECT` list, using values from both:

So, building on the query from your earlier question:

``````SELECT
indxs.*,
((indx_val2 - indx_val2_avg) * (indx_val1 - indx_val1_avg)) / total_rows AS cv
FROM (
SELECT
dt,
MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_nm ELSE NULL END) AS indx_nm1,
MAX(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1,
MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_nm ELSE NULL END) AS indx_nm2,
MAX(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2
FROM table1 a
GROUP BY dt
) indxs
INNER JOIN (
/* Join against a query returning the AVG() and COUNT() across all rows */
SELECT
'ABQI' AS indx_nm1_aname,
AVG(CASE WHEN indx_nm = 'ABQI' THEN indx_val ELSE NULL END) AS indx_val1_avg,
'ACNACTR' AS indx_nm2_aname,
AVG(CASE WHEN indx_nm = 'ACNACTR' THEN indx_val ELSE NULL END) AS indx_val2_avg,
COUNT(*) AS total_rows
FROM table1 b
WHERE indx_nm IN ('ABQI','ACNACTR')
/* And it is a cartesian product */
) aggs ON 1=1
WHERE
indx_nm1 IS NOT NULL
AND indx_nm2 IS NOT NULL
ORDER BY dt
``````

Here's a demo, building on your earlier one: http://sqlfiddle.com/#!6/2ec65/14

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download