SQL WHERE NOT EXISTS 查询未返回结果
SQL WHERE NOT EXISTS query not returning results
我在构造查询并将其发送到 impala 上的 运行 时遇到问题。我创建了以下工作查询来连接两个表:
SELECT *
FROM illuminavariant as vcf, ensembl_genes as ens
WHERE vcf.filter = "PASS"
AND vcf.qual > 100
AND vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
现在我正在尝试编写一个查询来查找所有变体 WHERE vcf.filter = "PASS" 和 vcf.qual > 100,但没有匹配染色体和位置。
我试过这个:
SELECT *
FROM p7dev.illumina_test, p7dev.ensembl_test
WHERE NOT EXISTS(
SELECT *
FROM p7dev.illumina_test as vcf, p7dev.ensembl_test as ens
WHERE vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
)
但这没有返回任何结果。我认为 WITH 子句可能会起到作用,但如果有人能帮助我理解它如何工作的逻辑,我将不胜感激。非常感谢!
试试这个...
SELECT *
FROM p7dev.illumina_test vcf
WHERE NOT EXISTS( SELECT 1
FROM p7dev.ensembl_test as ens
WHERE vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
)
AND vcf.filter = 'PASS'
AND vcf.qual > 100
由于您正在寻找与任何整体都不相关的变体,因此形成变体和整体的交叉连接以从中过滤掉行似乎很奇怪。不过,如果那真的是您想要的,那么应该这样做:
SELECT *
FROM illuminavariant as vcf, ensembl_genes as ens
WHERE vcf.filter = "PASS"
AND vcf.qual > 100
AND (
vcf.chromosome != ens.chromosome
OR vcf.position < ens.start
OR vcf.position > ens.stop
)
这只是否定了将变体行与整体行相关联的条件。
不过,我怀疑您真正想要的更像这样:
SELECT vcf.*
FROM
illuminavariant as vcf
LEFT JOIN ensembl_genes as ens
ON vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
WHERE
vcf.filter = "PASS"
AND vcf.qual > 100
AND ens.chromosome IS NULL
执行与您的第一个查询相同的联接,但作为左联接。然后,实际表示匹配的行会被 ens.chromosome IS NULL
条件过滤掉。它 returns 只有变体 table 的列,因为重点是找到在整体 table.
中没有对应行的变体
我在构造查询并将其发送到 impala 上的 运行 时遇到问题。我创建了以下工作查询来连接两个表:
SELECT *
FROM illuminavariant as vcf, ensembl_genes as ens
WHERE vcf.filter = "PASS"
AND vcf.qual > 100
AND vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
现在我正在尝试编写一个查询来查找所有变体 WHERE vcf.filter = "PASS" 和 vcf.qual > 100,但没有匹配染色体和位置。
我试过这个:
SELECT *
FROM p7dev.illumina_test, p7dev.ensembl_test
WHERE NOT EXISTS(
SELECT *
FROM p7dev.illumina_test as vcf, p7dev.ensembl_test as ens
WHERE vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
)
但这没有返回任何结果。我认为 WITH 子句可能会起到作用,但如果有人能帮助我理解它如何工作的逻辑,我将不胜感激。非常感谢!
试试这个...
SELECT *
FROM p7dev.illumina_test vcf
WHERE NOT EXISTS( SELECT 1
FROM p7dev.ensembl_test as ens
WHERE vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
)
AND vcf.filter = 'PASS'
AND vcf.qual > 100
由于您正在寻找与任何整体都不相关的变体,因此形成变体和整体的交叉连接以从中过滤掉行似乎很奇怪。不过,如果那真的是您想要的,那么应该这样做:
SELECT *
FROM illuminavariant as vcf, ensembl_genes as ens
WHERE vcf.filter = "PASS"
AND vcf.qual > 100
AND (
vcf.chromosome != ens.chromosome
OR vcf.position < ens.start
OR vcf.position > ens.stop
)
这只是否定了将变体行与整体行相关联的条件。
不过,我怀疑您真正想要的更像这样:
SELECT vcf.*
FROM
illuminavariant as vcf
LEFT JOIN ensembl_genes as ens
ON vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
WHERE
vcf.filter = "PASS"
AND vcf.qual > 100
AND ens.chromosome IS NULL
执行与您的第一个查询相同的联接,但作为左联接。然后,实际表示匹配的行会被 ens.chromosome IS NULL
条件过滤掉。它 returns 只有变体 table 的列,因为重点是找到在整体 table.