修改 SQL 查询以包含其他参数

Modify an SQL query to include additional parameters

我想通过 SQL 查询提取数据,但给定的代码没有给我一份报告,其中包括我想要的所有数据。

基本上,该报告结合了许多样本(准确地说是 95 个)的数据,然后为我提供了这些样本的序列。它还比较这些序列,看看它们是否出现在比 1 更多的样本中。

我想将参数 "v_family" 和 "j_gene" 作为附加列包括在内,查询需要从其中一个样本中获取这些参数(以与获取氨基酸序列类似的方式("amino_acid") 来自出现此序列的样本之一)。

如何将我的两个附加参数添加到此报告中?

这是产生 6 列的当前查询(另请参阅随附的屏幕截图):

select 
    value, 
    rank, 
    count(*) over (partition by amino_acid) as contributors, 
    total, 
    amino_acid, 
    sample_name 
from ( select 
        value, 
        row_number() over (partition by sample_name order by rank desc) as rank, 
        sum(value) over (partition by amino_acid) as total, 
        amino_acid, 
        sample_name 
            from ( select 
                        sum(productive_frequency) as value, 
                        sum(productive_frequency) as rank, 
                        amino_acid, 
                        sample_name 
                    from sequences 
                    group by 
                        amino_acid, 
                        sample_name 
                    order by 
                        value desc 
            )inner_query  
    ) outer_inner  
order by 
    sample_name asc, 
    rank

提出了以下编辑,但没有得到我想要的数据(见随附的屏幕截图):

select value, rank, count(*) over (partition by amino_acid) as contributors, total, amino_acid, sample_name from ( select value, row_number() over (partition by sample_name order by rank desc) as rank, sum(value) over (partition by amino_acid) as total, amino_acid, sample_name from ( select sum(productive_frequency) as value, sum(productive_frequency) as rank, amino_acid, sample_name, v_family from sequences group by amino_acid, sample_name, v_family order by value desc ) inner_query  ) outer_inner  order by sample_name asc, rank

old query

new query

这是建议,但没有改变结果:

select 
    value, 
    rank, 
    total, 
    amino_acid, 
    sample_name 
from ( select 
        value, 
        row_number() over (partition by sample_name order by rank desc) as rank, 
        sum(value) over (partition by amino_acid) as total, count(*) over (partition by amino_acid,v_family,j_gene) as contributors,
        amino_acid, 
       sample_name from ( SELECT sum(productive_frequency) AS value
    ,sum(productive_frequency) AS rank
    ,v_family
    ,j_gene
    ,amino_acid
    ,sample_name
FROM sequences
GROUP BY amino_acid
    ,sample_name
    ,v_family
    ,j_gene
ORDER BY value DESC ) inner_query ) outer_inner order by sample_name asc, rank

好的,解决了!正确代码如下,非常感谢大家的帮助!

SELECT value
    ,rank
    ,count(*) OVER (PARTITION BY amino_acid,v_family,j_gene) AS contributors
    ,total
    ,amino_acid
    ,sample_name
    ,v_family
    ,j_gene
FROM (
    SELECT value
        ,row_number() OVER (PARTITION BY sample_name ORDER BY rank DESC) AS rank
        ,sum(value) OVER (PARTITION BY amino_acid,v_family,j_gene) AS total
        ,amino_acid
        ,sample_name
        ,v_family
        ,j_gene
    FROM (
        SELECT sum(productive_frequency) AS value
            ,sum(productive_frequency) AS rank
            ,v_family
            ,j_gene
            ,amino_acid
            ,sample_name
        FROM sequences
        GROUP BY amino_acid
            ,sample_name
            ,v_family
            ,j_gene
        ORDER BY value DESC
        ) inner_query
    ) outer_inner
ORDER BY sample_name ASC
    ,rank

你的多级查询都是基于从table/view"sequence"中选择数据的最内层查询,所以如果还需要2个参数,则必须在最内层查询中添加仅,很可能有一个或多个额外的 table 将加入 "sequence" table/view。

而不是当前的 4 列值、等级、amino_acids、sample_name 将在最内层查询中有 6 列(加上 Gene、Family)。这些额外的 2 列必须包含在分组依据中,因此它们将出现在最顶部的查询中。

这可能是向基本查询添加 2 个新列的效果。 "group by" 语句通过所选列的唯一组合对数据进行分组,这两种情况下都是不同的。 比较这些查询结果: SELECT sum(productive_frequency) AS value ,sum(productive_frequency) AS rank ,v_family ,j_gene ,amino_acid ,sample_name FROM sequences GROUP BY amino_acid ,sample_name ,v_family ,j_gene ORDER BY value DESC

SELECT sum(productive_frequency) AS value ,sum(productive_frequency) AS rank ,amino_acid ,sample_name FROM sequences GROUP BY amino_acid ,sample_name ORDER BY value DESC 如果行数不同,那么您可以使用以下语句检查所提及列的唯一组合: select distinct v_family ,j_gene ,amino_acid ,sample_name FROM sequences

虽然 inner_query 结果改变了,但改变了 window 函数 sum(value) over (partition by amino_acid) as total, sum(value) over (partition by amino_acid,v_family,j_gene) as total, 对于每次我按 [Enter]

时添加的回复,我深表歉意

新版本。 SELECT value ,rank ,count(*) OVER (PARTITION BY amino_acid,v_family,j_gene) AS contributors ,total ,amino_acid ,sample_name ,v_family ,j_gene FROM ( SELECT value ,row_number() OVER (PARTITION BY sample_name ORDER BY rank DESC) AS rank ,sum(value) OVER (PARTITION BY amino_acid,v_family,j_gene) AS total ,amino_acid ,sample_name ,v_family ,j_gene FROM ( SELECT sum(productive_frequency) AS value ,sum(productive_frequency) AS rank ,v_family ,j_gene ,amino_acid ,sample_name FROM sequences GROUP BY amino_acid ,sample_name ,v_family ,j_gene ORDER BY value DESC ) inner_query ) outer_inner ORDER BY sample_name ASC ,rank