查询 table,同时排除其他 table 中引用的值
Query a table while excluding values referenced in other tables
我有一个数据库,里面装满了来自不同银行账户的交易。每笔交易都带有 user_id
、bank_id
、account_id
和 transaction_id
。如果用户选择忽略银行、帐户或个人交易,我想在查询时排除交易。
换句话说,如果用户:
- 忽略银行,跳过与该银行的所有交易
bank_id
,
- 忽略一个帐户,与该帐户的所有交易都会被跳过,
account_id
,
- 忽略单个交易,跳过
transaction_id
的交易。
我当前的数据库是这样的:
-- Simplified for brevity.
CREATE TABLE IF NOT EXISTS transactions
(
user_id TEXT NOT NULL,
transaction_id TEXT NOT NULL,
account_id TEXT NOT NULL,
bank_id TEXT NOT NULL,
PRIMARY KEY (user_id, transaction_id)
);
-- Exclusion tables for banks and accounts are similar.
CREATE TABLE IF NOT EXISTS excluded_transactions
(
id INTEGER PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
user_id TEXT NOT NULL,
transaction_id TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS exc_trn_idx ON excluded_transactions (user_id, transaction_id);
只要用户 abc
排除银行、账户或交易,它就会被添加到适当的排除项中 table。然后查询如下所示:
WITH b AS (
SELECT bank_id FROM excluded_banks WHERE user_id = 'abc'
), a AS (
SELECT account_id FROM excluded_accounts WHERE user_id = 'abc'
), t AS (
SELECT transaction_id FROM excluded_transactions WHERE user_id = 'abc'
)
SELECT * FROM transactions
WHERE user_id = 'abc'
AND bank_id NOT IN (SELECT * FROM b)
AND account_id NOT IN (SELECT * FROM a)
AND transaction_id NOT IN (SELECT * FROM t)
这在约 1M 事务的测试集上给出了 OK 性能(平均约 100 毫秒的计划时间,约 1 秒的执行时间)。但是,我担心它会随着数据库的增长而显着降低。
我的问题是:我如何改进tables/queries以有效地检索具有上述约束的交易?较慢的写入是 acceptable 如果它们使读取更快。另外,如果我采用的一般方法不是最优的,请告诉我 and/or 建议改进方法。
我建议这样写:
SELECT t.*
FROM transactions t
WHERE t.user_id = 'abc' AND
NOT EXISTS (SELECT 1
FROM excluded_banks eb
WHERE eb.bank_id = t.bank_id AND
eb.user_id = t.user_id
) AND
NOT EXISTS (SELECT 1
FROM excluded_accounts ea
WHERE ea.account_id = t.account_id AND
ea.user_id = t.user_id
) AND
NOT EXISTS (SELECT 1
FROM excluded_transaction et
WHERE et.transaction_id = t.transaction_id AND
et.user_id = t.user_id
);
然后确保您有以下索引:
excluded_banks(user_id, bank_id)
excluded_accounts(user_id, account_id)
excluded_transaction(user_id, transaction_id)
我有一个数据库,里面装满了来自不同银行账户的交易。每笔交易都带有 user_id
、bank_id
、account_id
和 transaction_id
。如果用户选择忽略银行、帐户或个人交易,我想在查询时排除交易。
换句话说,如果用户:
- 忽略银行,跳过与该银行的所有交易
bank_id
, - 忽略一个帐户,与该帐户的所有交易都会被跳过,
account_id
, - 忽略单个交易,跳过
transaction_id
的交易。
我当前的数据库是这样的:
-- Simplified for brevity.
CREATE TABLE IF NOT EXISTS transactions
(
user_id TEXT NOT NULL,
transaction_id TEXT NOT NULL,
account_id TEXT NOT NULL,
bank_id TEXT NOT NULL,
PRIMARY KEY (user_id, transaction_id)
);
-- Exclusion tables for banks and accounts are similar.
CREATE TABLE IF NOT EXISTS excluded_transactions
(
id INTEGER PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
user_id TEXT NOT NULL,
transaction_id TEXT NOT NULL
);
CREATE INDEX IF NOT EXISTS exc_trn_idx ON excluded_transactions (user_id, transaction_id);
只要用户 abc
排除银行、账户或交易,它就会被添加到适当的排除项中 table。然后查询如下所示:
WITH b AS (
SELECT bank_id FROM excluded_banks WHERE user_id = 'abc'
), a AS (
SELECT account_id FROM excluded_accounts WHERE user_id = 'abc'
), t AS (
SELECT transaction_id FROM excluded_transactions WHERE user_id = 'abc'
)
SELECT * FROM transactions
WHERE user_id = 'abc'
AND bank_id NOT IN (SELECT * FROM b)
AND account_id NOT IN (SELECT * FROM a)
AND transaction_id NOT IN (SELECT * FROM t)
这在约 1M 事务的测试集上给出了 OK 性能(平均约 100 毫秒的计划时间,约 1 秒的执行时间)。但是,我担心它会随着数据库的增长而显着降低。
我的问题是:我如何改进tables/queries以有效地检索具有上述约束的交易?较慢的写入是 acceptable 如果它们使读取更快。另外,如果我采用的一般方法不是最优的,请告诉我 and/or 建议改进方法。
我建议这样写:
SELECT t.*
FROM transactions t
WHERE t.user_id = 'abc' AND
NOT EXISTS (SELECT 1
FROM excluded_banks eb
WHERE eb.bank_id = t.bank_id AND
eb.user_id = t.user_id
) AND
NOT EXISTS (SELECT 1
FROM excluded_accounts ea
WHERE ea.account_id = t.account_id AND
ea.user_id = t.user_id
) AND
NOT EXISTS (SELECT 1
FROM excluded_transaction et
WHERE et.transaction_id = t.transaction_id AND
et.user_id = t.user_id
);
然后确保您有以下索引:
excluded_banks(user_id, bank_id)
excluded_accounts(user_id, account_id)
excluded_transaction(user_id, transaction_id)