xCloudx8 xCloudx8 - 10 months ago 45
SQL Question

Calculate frequency for each row

I'm trying to calculate the frequency of my elements in each row, i'll explain:
I select from a table containing some elements such as "pos,chr,ref,alt,id_disease".

From these i'll have to extract the frequency of my ref,alt that is:

num_occurrencies_of(ref='A' and alt='C')/total number of rows

With this query i get barely near to my objective, in fact id does not calculate properly the frequency it returns always
a constant

SELECT pos, chr, upper(ref||' '||alt) AS refalt, id_disease AS lvl15, t1.tot_var, t1.freq
FROM varianti
JOIN ( SELECT count(*) AS tot_var,(count(*)::numeric / sum(count(*)) over ()) as freq
FROM varianti)t1 ON TRUE
WHERE length(ref)=1 AND length(alt)=1 AND chr similar to 'chr[\d X Y]*'

All i want is to retrieve data like this:

chr pos refalt lvl15 freq tot_var
1 120 AT 15 0.3 1000
1 150 CG 30 0.01 1000

tot_var = counts total number of my rows i need it (It cannot be 1 i count every row!)

Both ref and alt could have those values (A,T,C,G) in every permutation possible, AA,AT,TA,TC,CT etc..

What i'm missing in my code?

Tell me if you want further infos

Example of varianti:

chr pos ref alt id_disease
chr1 152 A C 15
chr3 487 T T 74

here is the output of my query:

pos chr refalt lvl15 tot_var freq
124338543 chr11 G A 69 1 0.000000677833751782702767
124338595 chr11 C T 28 1 0.000000677833751782702767
124361862 chr11 C . 53 1 0.000000677833751782702767
124361899 chr11 T A 20 1 0.000000677833751782702767

Answer Source

Based on the information you provided

upper(ref||' '||alt) AS refalt, id_disease AS lvl15, 
SUM(CASE WHEN (ref == 'A' AND alt == 'C')THEN 1 ELSE 0 END)/COUNT(*) AS 'freq', 
COUNT(*) AS 'tot_var' 
FROM varianti

I am still not sure what 'tot_var' is. It would be useful to get a actual data sample, and expected output of that data sample itself.

Edit 1: To get the frequency of each couple in the dataset

SELECT DISTINCT upper(ref||' '||alt) AS refalt, 
COUNT(chr)/COUNT(*) AS 'freq' 
FROM varianti 
GROUP BY refalt

Edit 2: Updated query based on requirements

SELECT varianti.chr, varianti.pos, 
upper(varianti.ref||' '||varianti.alt) AS refalt, varianti.id_disease AS lvl15, COUNT(*) AS 'tot_var', 
FROM varianti
( SELECT DISTINCT upper(ref||' '||alt) AS refalt, 
  COUNT(chr)/COUNT(*) AS 'freq' 
  FROM varianti 
  GROUP BY refalt 
) refalt_table ON refalt_table.refalt = varianti.refalt

Edit 3: Updated query based on error

SELECT chr, pos, upper(ref||' '||alt) as refalt, id_disease AS lvl15, refalt_table.freq as 'freq', (SELECT COUNT(*) FROM varianti tot where tot.pos = v.pos) as 'tot_var'
FROM varianti v
( SELECT DISTINCT UPPER(ref) as 'ref',UPPER(alt) as 'alt', 
  COUNT(pos)/(SELECT COUNT(*) FROM varianti vcount) AS 'freq' 
  FROM varianti
  GROUP BY ref,alt
) refalt_table ON refalt_table.ref = v.ref and refalt_table.alt = v.alt