xCloudx8 - 1 year ago 81
SQL Question

# Calculate frequency for each row

I'm trying to calculate the frequency of my elements in each row, i'll explain:
I select from a table containing some elements such as "pos,chr,ref,alt,id_disease".

From these i'll have to extract the frequency of my ref,alt that is:

`num_occurrencies_of(ref='A' and alt='C')/total number of rows`
.

With this query i get barely near to my objective, in fact id does not calculate properly the frequency it returns always
`a constant`

``````SELECT pos, chr, upper(ref||' '||alt) AS refalt, id_disease AS lvl15, t1.tot_var, t1.freq
FROM varianti
JOIN ( SELECT count(*) AS tot_var,(count(*)::numeric / sum(count(*)) over ()) as freq
FROM varianti)t1 ON TRUE
WHERE length(ref)=1 AND length(alt)=1 AND chr similar to 'chr[\d X Y]*'
``````

All i want is to retrieve data like this:

``````chr pos refalt lvl15 freq tot_var
1   120  AT     15    0.3  1000
1   150  CG     30    0.01 1000
``````

tot_var = counts total number of my rows i need it (It cannot be 1 i count every row!)

Both ref and alt could have those values (A,T,C,G) in every permutation possible, AA,AT,TA,TC,CT etc..

What i'm missing in my code?

Tell me if you want further infos

Example of varianti:

``````chr pos ref alt id_disease
chr1 152 A   C    15
chr3 487 T   T    74
``````

here is the output of my query:

``````pos          chr    refalt  lvl15   tot_var freq
124338543   chr11   G A      69      1     0.000000677833751782702767
124338595   chr11   C T      28      1      0.000000677833751782702767
124361862   chr11   C .      53      1     0.000000677833751782702767
124361899   chr11   T A      20      1     0.000000677833751782702767
``````

Based on the information you provided

``````SELECT DISTINCT chr, pos,
upper(ref||' '||alt) AS refalt, id_disease AS lvl15,
SUM(CASE WHEN (ref == 'A' AND alt == 'C')THEN 1 ELSE 0 END)/COUNT(*) AS 'freq',
COUNT(*) AS 'tot_var'
FROM varianti
``````

I am still not sure what 'tot_var' is. It would be useful to get a actual data sample, and expected output of that data sample itself.

Edit 1: To get the frequency of each couple in the dataset

``````SELECT DISTINCT upper(ref||' '||alt) AS refalt,
COUNT(chr)/COUNT(*) AS 'freq'
FROM varianti
GROUP BY refalt
``````

Edit 2: Updated query based on requirements

``````SELECT varianti.chr, varianti.pos,
upper(varianti.ref||' '||varianti.alt) AS refalt, varianti.id_disease AS lvl15, COUNT(*) AS 'tot_var',
FROM varianti
JOIN
( SELECT DISTINCT upper(ref||' '||alt) AS refalt,
COUNT(chr)/COUNT(*) AS 'freq'
FROM varianti
GROUP BY refalt
) refalt_table ON refalt_table.refalt = varianti.refalt
``````

Edit 3: Updated query based on error

``````SELECT chr, pos, upper(ref||' '||alt) as refalt, id_disease AS lvl15, refalt_table.freq as 'freq', (SELECT COUNT(*) FROM varianti tot where tot.pos = v.pos) as 'tot_var'
FROM varianti v
LEFT JOIN
( SELECT DISTINCT UPPER(ref) as 'ref',UPPER(alt) as 'alt',
COUNT(pos)/(SELECT COUNT(*) FROM varianti vcount) AS 'freq'
FROM varianti
GROUP BY ref,alt
) refalt_table ON refalt_table.ref = v.ref and refalt_table.alt = v.alt
``````
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download