每个样本的特定频率

Specific frequency per samples

这是我的 table:

     chr  pos refalt
     ---------------
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AA
     chr1 123 AC
     chr1 123 AC
     chr1 123 AC
     chr2 456 TC
     chr3 789 GC

我需要计算具体频率,我举个例子:

每一行都是一位患者,因此 "chr1 123 AA " 有 5 位患者,"chr1 123 AC" 有 3 位患者。

我想知道A的频率

计算是:

13(A)
/16   , Because There are 13 people in "Chr1 123" who has A and in total they're 16 5XA (ref) 5XA(alt) + 3XA (ref) 3XC(alt)

对于 C:

3(C)/16 , Because only 3 people has C

如何在 SQL 中实现它是否太复杂?

Refalt 是一个 varchar 列,所以我需要拆分每个值以获得 ref 和 alt。

我知道有点复杂,请向我询问更多详情。

对于任何想知道(特别是生物学家)如何实现这一点的人:

select substring(refalt from 1 for 1),  
           count( substring(refalt from 1 for 1) )::numeric / 
           (select 2*count(*) from ft_variants where pos_chr like 'chr1 12783') as frequency_allele1
    from ft_variants
    where pos_chr like 'chr1 12783'
    group by refalt

union

select substring(refalt from 2 for 1),  
       count( substring(refalt from 2 for 1) )::numeric / 
       (select 2*count(*) from ft_variants where pos_chr like 'chr1 12783') as frequency_allele2
from ft_variants
where pos_chr like 'chr1 12783'
group by refalt;