有效地选择存在相关 table 中的行的行
Efficiently selecting rows where a row in a related table exists
我在当前正在处理的系统中有一个循环模式,例如,我需要 select 所有在可能公司列表下有订单的用户。或者需要 select 个用户,如果该用户被标记的记录存在。
我的 users
table 包含 430,825 条记录,所以这应该不难处理。现在我很接近,我有一个查询,它得到了我正在寻找的 .047s 执行时间,但如果我再添加一段,它会变得非常慢。
这是我当前的查询,最快的查询:
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or`UserID`in(select*
from(select`UserID`
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`userassociations`
where`_Email`='brian@yeet.com'
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1)`a`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
(多出来的select*from(...)
就是因为这个)
在中间,我通过电子邮件地址进一步拉动用户,同时检查其他相关的 tables 以查找可能与该用户相关的电子邮件。比如,下一篇在向客户发送报价时搜索用户,包括他们的抄送地址。
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or`UserID`in(select*
from(select`UserID`
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`userassociations`
where`_Email`='brian@yeet.com'
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`userquotesemails`
where`Email`='brian@yeet.com'
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1)`a`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
我已经添加了备用 table 来搜索电子邮件,但现在查询需要 3.016 秒,这非常非常慢。奇怪的是,在我构建此查询时,最后一部分似乎是此处性能的转折点,这是什么原因?
第一个和第二个分别解释
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
| 1 | PRIMARY | <subquery6> | | ALL | | | | | | 0.00 | Using temporary; Using filesort |
| 1 | PRIMARY | users | | eq_ref | PRIMARY,UserID_UNIQUE,fk_users_1_idx,users_Customers | PRIMARY | 144 | <subquery6>.UserID | 1 | 50.00 | Using where |
| 6 | MATERIALIZED | userassociations | | ref | userassociations_UserID,userassociations__Email | userassociations__Email | 1026 | const | 3 | 10.00 | Using where |
| 10 | DEPENDENT SUBQUERY | usercustomerflags | | ref | usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | sterling.users.UserID | 1 | 10.00 | Using where |
| 8 | DEPENDENT SUBQUERY | usercustomerflags | | index_subquery | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | func | 1 | 4.95 | Using where |
| 4 | DEPENDENT SUBQUERY | quoterequests | | index_subquery | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests__Latest | 145 | func | 2 | 5.00 | Using where |
| 2 | DEPENDENT SUBQUERY | invoices | | index_subquery | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble | Invoice_UserID_idx | 145 | func | 1 | 3.33 | Using where |
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
| 1 | PRIMARY | users | | ref | fk_users_1_idx,users_Customers | users_Customers | 4 | const | 227515 | 100.00 | Using where |
| 12 | DEPENDENT SUBQUERY | usercustomerflags | | ref | usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | sterling.users.UserID | 1 | 10.00 | Using where |
| 10 | SUBQUERY | usercustomerflags | | ALL | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx | | | | 3509 | 4.94 | Using where |
| 8 | SUBQUERY | userquotesemails | | ref | userquotesemails_Email__Active,userquotesemails_UserID | userquotesemails_Email__Active | 1027 | const,const | 1 | 100.00 | |
| 6 | SUBQUERY | userassociations | | ref | userassociations_UserID,userassociations__Email | userassociations__Email | 1026 | const | 3 | 10.00 | Using where |
| 4 | SUBQUERY | quoterequests | | ref | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests_CompanyID | 144 | const | 16702 | 10.00 | Using where |
| 2 | SUBQUERY | invoices | | ref | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble | Invoice_CompanyID_idx | 144 | const | 17678 | 10.00 | Using where |
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
另外,我尝试过使用连接,例如加入 invoices
table 等,但随后我遇到了每个 invoice
或 quoterequest
连接接收的重复用户行的问题,并且 grouping/distinct &在几分钟内,对生成的数据进行排序变得非常缓慢。
我也只尝试了第一个查询的 "exists" 版本,正如文档 https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-exists.html 所建议的那样
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or exists(select 1
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1
and`users`.`UserID`=`UserID`)
or exists(select 1
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1
and`users`.`UserID`=`UserID`))
and(exists(select 1
from`userassociations`
where`_Email`='brian@yeet.com'
and`__Active`=1
and`users`.`UserID`=`UserID`))
and(exists(select 1
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1
and`users`.`UserID`=`UserID`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
但这让我达到了 5.516 秒,所以这似乎绝对不是正确的方向。
按照我尝试的方式 select 获取数据的最有效方法是什么?或者我是否需要重组我的一些 table 以获得我正在寻找的性能?
我已经找出了我认为存在的最小的子问题和瓶颈。这是我更简单的查询
select`users`.`UserID`,`users`.`_Customer`
from`users`
left join`userassociations`on`userassociations`.`UserID`=`users`.`UserID`
and`userassociations`.`__Active`=1
where(`users`.`Email`='brian@stumpyinc.com'
or`userassociations`.`_Email`='brian@stumpyinc.com')
and`users`.`Deleted`=0
order by`users`.`DateTimeAdded`desc
limit 50;
和解释
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
| 1 | SIMPLE | users | | ref | users_getemail_INDEX,unify_email_INDEX,users_Customers | users_Customers | 4 | const | 221463 | 100.00 | Using where |
| 1 | SIMPLE | userassociations | | ref | userassociations_UserID | userassociations_UserID | 144 | sterling.users.UserID | 1 | 100.00 | Using where |
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
执行此查询大约需要 1.5 秒
CREATE TABLE `users` (
`UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
...
`Email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
...
`DateTimeAdded` datetime DEFAULT NULL,
...
`Deleted` int(1) NOT NULL DEFAULT '0',
...
`_LatestInvoiceDateTimeAdded` datetime DEFAULT NULL,
`_InvoiceCount` int(11) NOT NULL DEFAULT '0',
`_Customer` varchar(512) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
...
PRIMARY KEY (`UserID`),
UNIQUE KEY `UserID_UNIQUE` (`UserID`),
...
KEY `users_getemail_INDEX` (`Email`(191),`_InvoiceCount`,`_LatestInvoiceDateTimeAdded`,`DateTimeAdded`),
KEY `unify_email_INDEX` (`Email`(191),`UserID`),
...
KEY `users_Customers` (`Deleted`,`DateTimeAdded`),
...
KEY `users_DateTimeAdded` (`DateTimeAdded`,`UserID`),
FULLTEXT KEY `users_FULLTEXT__Customer` (`_Customer`),
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `userassociations` (
`UserAssociationID` binary(16) NOT NULL,
`UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
`AssociatedUserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
`_Email` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL,
`__UserID` char(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`__Active` tinyint(1) NOT NULL DEFAULT '1',
`__Added` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
`__Updated` timestamp(6) NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP(6),
PRIMARY KEY (`UserAssociationID`),
KEY `userassociations_UserID` (`UserID`),
KEY `userassociations_AssociatedUserID` (`AssociatedUserID`),
KEY `userassociations___UserID` (`__UserID`),
KEY `userassociations__Email` (`_Email`),
CONSTRAINT `userassociations_AssociatedUserID` FOREIGN KEY (`AssociatedUserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `userassociations_UserID` FOREIGN KEY (`UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `userassociations___UserID` FOREIGN KEY (`__UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
嗯...所以它似乎在工作,但我发现一对 table 似乎效率不高,这是我的 users
和 invoices
tables.
我有这些索引:
users: INDEX(`CompanyID`, `Deleted`, `DateTimeAdded`)
invoices: INDEX(`UserID`, `__Active`)
invoices: INDEX(`CompanyID`)
users: INDEX(`UserID`, `Deleted`)
和查询
select`users`.`UserID`,`users`.`DateTimeAdded`
from`users`
join`invoices`on`invoices`.`UserID`=`users`.`UserID`
and`invoices`.`__Active`=1
where`invoices`.`CompanyID`='3e55c8b4-d8b6-11e4-b38f-b8ca3a83b4c8'
and`users`.`Deleted`=0
order by`DateTimeAdded`desc
limit 200;
仅此查询就需要 0.3 秒,这对我来说感觉很慢,就像它没有充分利用索引一样,特别是因为 users
只有 430,997 行,而 invoices
只有有 194,180,这看起来应该是一个非常简单的查询。
编辑:实际上比这更糟糕,如果给定的 CompanyID 仅包含 ~4 行,则此查询需要 3.5 秒
+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
| 1 | SIMPLE | invoices | | ref | Invoice_CompanyID_idx,invoices_UserID___Active | Invoice_CompanyID_idx | 144 | const | 7750 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | users | | ref | users_UserID_Deleted | users_UserID_Deleted | 148 | sterling.invoices.UserID,const | 1 | 100.00 | |
+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
很确定您可以将 WHERE
子句中的所有子查询替换为 INNER
和 LEFT JOIN
的组合。试试这个:
SELECT u.UserID
FROM users u
INNER JOIN userassociations ua ON ua.userid = u.userid
LEFT JOIN usercustomerflags uc ON uc.userid = u.userid AND uc.__Active =1
LEFT JOIN invoices i ON i.userid = u.userid AND i.__Active =1
LEFT JOIN quoterequests q ON q.userid = u.userid AND q.__Active =1
WHERE ua._Email ='brian@yeet.com'
AND ua.__Active =1
AND (uc.userid IS NULL
OR (uc.userid IS NOT NULL AND uc.CustomerFlagID IN (10,27,17,1,2,3,4,5,6))
)
AND (u.CompanyID = '3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8'
OR (i.CompanyID = '3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8' AND i.userid IS NOT null)
OR (q.CompanyID = '3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8' AND q.userid IS NOT null)
)
AND u.Deleted =0
ORDER BY u.DateTimeAdded desc
LIMIT 50;
对于那个较小的问题:
( select u.`UserID`, u.`_Customer`, u.DateTimeAdded
from `users` AS u
where u.`Email` = 'brian@stumpyinc.com'
and u.`Deleted` = 0
AND EXISTS ( SELECT * FROM `userassociations`
WHERE UserId = u.UserID
AND __Active = 1 )
order by u.`DateTimeAdded` desc
limit 50
)
UNION DISTINCT
( select u.`UserID`, u.`_Customer`, u.DateTimeAdded
from `users` AS u
JOIN `userassociations` AS ua
ON ua.`UserID` = u.`UserID`
and ua.`__Active` = 1
where ua.`_Email` = 'brian@stumpyinc.com'
and u.`Deleted`=0
order by u.`DateTimeAdded` desc
limit 50
)
order by `DateTimeAdded` desc
limit 50
需要这些:
u: INDEX(Email, Deleted, DateTimeAdded) -- date last
ua: INDEX(UserId, __Active) -- either order
ua: INDEX(_Email)
u: INDEX(UserID, Deleted)
(如果您遇到语法错误,请告诉我。如果太慢,请提供 EXPLAIN
。)
索引前缀(Email(191)
)通常没有用。摆脱它。这里有 5 种方法可以避免它:http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexes
PK 是一个 UNIQUE 键,所以去掉第二个:
PRIMARY KEY (`UserID`),
UNIQUE KEY `UserID_UNIQUE` (`UserID`),
闻起来像 UUID;使用 ascii (ascii_general_ci),而不是 utf8mb4:
... char(36) COLLATE utf8mb4_unicode_ci
INT(1)
占用4个字节;使用 TINYINT
作为标志。
我在当前正在处理的系统中有一个循环模式,例如,我需要 select 所有在可能公司列表下有订单的用户。或者需要 select 个用户,如果该用户被标记的记录存在。
我的 users
table 包含 430,825 条记录,所以这应该不难处理。现在我很接近,我有一个查询,它得到了我正在寻找的 .047s 执行时间,但如果我再添加一段,它会变得非常慢。
这是我当前的查询,最快的查询:
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or`UserID`in(select*
from(select`UserID`
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`userassociations`
where`_Email`='brian@yeet.com'
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1)`a`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
(多出来的select*from(...)
就是因为这个)
在中间,我通过电子邮件地址进一步拉动用户,同时检查其他相关的 tables 以查找可能与该用户相关的电子邮件。比如,下一篇在向客户发送报价时搜索用户,包括他们的抄送地址。
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or`UserID`in(select*
from(select`UserID`
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`userassociations`
where`_Email`='brian@yeet.com'
and`__Active`=1)`a`)
or`UserID`in(select*
from(select`UserID`
from`userquotesemails`
where`Email`='brian@yeet.com'
and`__Active`=1)`a`))
and(`UserID`in(select*
from(select`UserID`
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1)`a`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
我已经添加了备用 table 来搜索电子邮件,但现在查询需要 3.016 秒,这非常非常慢。奇怪的是,在我构建此查询时,最后一部分似乎是此处性能的转折点,这是什么原因?
第一个和第二个分别解释
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
| 1 | PRIMARY | <subquery6> | | ALL | | | | | | 0.00 | Using temporary; Using filesort |
| 1 | PRIMARY | users | | eq_ref | PRIMARY,UserID_UNIQUE,fk_users_1_idx,users_Customers | PRIMARY | 144 | <subquery6>.UserID | 1 | 50.00 | Using where |
| 6 | MATERIALIZED | userassociations | | ref | userassociations_UserID,userassociations__Email | userassociations__Email | 1026 | const | 3 | 10.00 | Using where |
| 10 | DEPENDENT SUBQUERY | usercustomerflags | | ref | usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | sterling.users.UserID | 1 | 10.00 | Using where |
| 8 | DEPENDENT SUBQUERY | usercustomerflags | | index_subquery | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | func | 1 | 4.95 | Using where |
| 4 | DEPENDENT SUBQUERY | quoterequests | | index_subquery | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests__Latest | 145 | func | 2 | 5.00 | Using where |
| 2 | DEPENDENT SUBQUERY | invoices | | index_subquery | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble | Invoice_UserID_idx | 145 | func | 1 | 3.33 | Using where |
+----+--------------------+-------------------+--+----------------+---------------------------------------------------------------------------------------------+------------------------------+------+-----------------------+---+-------+---------------------------------+
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
| 1 | PRIMARY | users | | ref | fk_users_1_idx,users_Customers | users_Customers | 4 | const | 227515 | 100.00 | Using where |
| 12 | DEPENDENT SUBQUERY | usercustomerflags | | ref | usercustomerflags_UserID_idx | usercustomerflags_UserID_idx | 144 | sterling.users.UserID | 1 | 10.00 | Using where |
| 10 | SUBQUERY | usercustomerflags | | ALL | usercustomerflags_CustomerFlagID_idx,usercustomerflags_UserID_idx | | | | 3509 | 4.94 | Using where |
| 8 | SUBQUERY | userquotesemails | | ref | userquotesemails_Email__Active,userquotesemails_UserID | userquotesemails_Email__Active | 1027 | const,const | 1 | 100.00 | |
| 6 | SUBQUERY | userassociations | | ref | userassociations_UserID,userassociations__Email | userassociations__Email | 1026 | const | 3 | 10.00 | Using where |
| 4 | SUBQUERY | quoterequests | | ref | quoterequests_CompanyID,quoterequests_UserID,quoterequests__Latest,quoterequests_UserQuotes | quoterequests_CompanyID | 144 | const | 16702 | 10.00 | Using where |
| 2 | SUBQUERY | invoices | | ref | Invoice_UserID_idx,Invoice_CompanyID_idx,invoices_SampleRequests,invoices_LateOrdersBubble | Invoice_CompanyID_idx | 144 | const | 17678 | 10.00 | Using where |
+----+--------------------+-------------------+--+-----+---------------------------------------------------------------------------------------------+--------------------------------+------+-----------------------+--------+--------+-------------+
另外,我尝试过使用连接,例如加入 invoices
table 等,但随后我遇到了每个 invoice
或 quoterequest
连接接收的重复用户行的问题,并且 grouping/distinct &在几分钟内,对生成的数据进行排序变得非常缓慢。
我也只尝试了第一个查询的 "exists" 版本,正如文档 https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-exists.html 所建议的那样
select`UserID`
from`users`
where(`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
or exists(select 1
from`invoices`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1
and`users`.`UserID`=`UserID`)
or exists(select 1
from`quoterequests`
where`CompanyID`in('3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8')
and`__Active`=1
and`users`.`UserID`=`UserID`))
and(exists(select 1
from`userassociations`
where`_Email`='brian@yeet.com'
and`__Active`=1
and`users`.`UserID`=`UserID`))
and(exists(select 1
from`usercustomerflags`
where`CustomerFlagID`in(10,27,17,1,2,3,4,5,6)
and`__Active`=1
and`users`.`UserID`=`UserID`)
or not exists(select 1
from`usercustomerflags`
where`__Active`=1
and`users`.`UserID`=`UserID`))
and`Deleted`=0
order by`DateTimeAdded`desc
limit 50;
但这让我达到了 5.516 秒,所以这似乎绝对不是正确的方向。
按照我尝试的方式 select 获取数据的最有效方法是什么?或者我是否需要重组我的一些 table 以获得我正在寻找的性能?
我已经找出了我认为存在的最小的子问题和瓶颈。这是我更简单的查询
select`users`.`UserID`,`users`.`_Customer`
from`users`
left join`userassociations`on`userassociations`.`UserID`=`users`.`UserID`
and`userassociations`.`__Active`=1
where(`users`.`Email`='brian@stumpyinc.com'
or`userassociations`.`_Email`='brian@stumpyinc.com')
and`users`.`Deleted`=0
order by`users`.`DateTimeAdded`desc
limit 50;
和解释
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
| 1 | SIMPLE | users | | ref | users_getemail_INDEX,unify_email_INDEX,users_Customers | users_Customers | 4 | const | 221463 | 100.00 | Using where |
| 1 | SIMPLE | userassociations | | ref | userassociations_UserID | userassociations_UserID | 144 | sterling.users.UserID | 1 | 100.00 | Using where |
+---+--------+------------------+--+-----+--------------------------------------------------------+-------------------------+-----+-----------------------+--------+--------+-------------+
执行此查询大约需要 1.5 秒
CREATE TABLE `users` (
`UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
...
`Email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
...
`DateTimeAdded` datetime DEFAULT NULL,
...
`Deleted` int(1) NOT NULL DEFAULT '0',
...
`_LatestInvoiceDateTimeAdded` datetime DEFAULT NULL,
`_InvoiceCount` int(11) NOT NULL DEFAULT '0',
`_Customer` varchar(512) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
...
PRIMARY KEY (`UserID`),
UNIQUE KEY `UserID_UNIQUE` (`UserID`),
...
KEY `users_getemail_INDEX` (`Email`(191),`_InvoiceCount`,`_LatestInvoiceDateTimeAdded`,`DateTimeAdded`),
KEY `unify_email_INDEX` (`Email`(191),`UserID`),
...
KEY `users_Customers` (`Deleted`,`DateTimeAdded`),
...
KEY `users_DateTimeAdded` (`DateTimeAdded`,`UserID`),
FULLTEXT KEY `users_FULLTEXT__Customer` (`_Customer`),
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
CREATE TABLE `userassociations` (
`UserAssociationID` binary(16) NOT NULL,
`UserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
`AssociatedUserID` char(36) COLLATE utf8mb4_unicode_ci NOT NULL,
`_Email` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL,
`__UserID` char(36) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`__Active` tinyint(1) NOT NULL DEFAULT '1',
`__Added` timestamp(6) NOT NULL DEFAULT CURRENT_TIMESTAMP(6),
`__Updated` timestamp(6) NULL DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP(6),
PRIMARY KEY (`UserAssociationID`),
KEY `userassociations_UserID` (`UserID`),
KEY `userassociations_AssociatedUserID` (`AssociatedUserID`),
KEY `userassociations___UserID` (`__UserID`),
KEY `userassociations__Email` (`_Email`),
CONSTRAINT `userassociations_AssociatedUserID` FOREIGN KEY (`AssociatedUserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `userassociations_UserID` FOREIGN KEY (`UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `userassociations___UserID` FOREIGN KEY (`__UserID`) REFERENCES `users` (`UserID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
嗯...所以它似乎在工作,但我发现一对 table 似乎效率不高,这是我的 users
和 invoices
tables.
我有这些索引:
users: INDEX(`CompanyID`, `Deleted`, `DateTimeAdded`)
invoices: INDEX(`UserID`, `__Active`)
invoices: INDEX(`CompanyID`)
users: INDEX(`UserID`, `Deleted`)
和查询
select`users`.`UserID`,`users`.`DateTimeAdded`
from`users`
join`invoices`on`invoices`.`UserID`=`users`.`UserID`
and`invoices`.`__Active`=1
where`invoices`.`CompanyID`='3e55c8b4-d8b6-11e4-b38f-b8ca3a83b4c8'
and`users`.`Deleted`=0
order by`DateTimeAdded`desc
limit 200;
仅此查询就需要 0.3 秒,这对我来说感觉很慢,就像它没有充分利用索引一样,特别是因为 users
只有 430,997 行,而 invoices
只有有 194,180,这看起来应该是一个非常简单的查询。
编辑:实际上比这更糟糕,如果给定的 CompanyID 仅包含 ~4 行,则此查询需要 3.5 秒
+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
| 1 | SIMPLE | invoices | | ref | Invoice_CompanyID_idx,invoices_UserID___Active | Invoice_CompanyID_idx | 144 | const | 7750 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | users | | ref | users_UserID_Deleted | users_UserID_Deleted | 148 | sterling.invoices.UserID,const | 1 | 100.00 | |
+---+--------+----------+--+-----+------------------------------------------------+-----------------------+-----+--------------------------------+------+--------+----------------------------------------------+
很确定您可以将 WHERE
子句中的所有子查询替换为 INNER
和 LEFT JOIN
的组合。试试这个:
SELECT u.UserID
FROM users u
INNER JOIN userassociations ua ON ua.userid = u.userid
LEFT JOIN usercustomerflags uc ON uc.userid = u.userid AND uc.__Active =1
LEFT JOIN invoices i ON i.userid = u.userid AND i.__Active =1
LEFT JOIN quoterequests q ON q.userid = u.userid AND q.__Active =1
WHERE ua._Email ='brian@yeet.com'
AND ua.__Active =1
AND (uc.userid IS NULL
OR (uc.userid IS NOT NULL AND uc.CustomerFlagID IN (10,27,17,1,2,3,4,5,6))
)
AND (u.CompanyID = '3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8'
OR (i.CompanyID = '3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8' AND i.userid IS NOT null)
OR (q.CompanyID = '3e55d1bb-d8b6-11e4-b38f-b8ca3a83b4c8' AND q.userid IS NOT null)
)
AND u.Deleted =0
ORDER BY u.DateTimeAdded desc
LIMIT 50;
对于那个较小的问题:
( select u.`UserID`, u.`_Customer`, u.DateTimeAdded
from `users` AS u
where u.`Email` = 'brian@stumpyinc.com'
and u.`Deleted` = 0
AND EXISTS ( SELECT * FROM `userassociations`
WHERE UserId = u.UserID
AND __Active = 1 )
order by u.`DateTimeAdded` desc
limit 50
)
UNION DISTINCT
( select u.`UserID`, u.`_Customer`, u.DateTimeAdded
from `users` AS u
JOIN `userassociations` AS ua
ON ua.`UserID` = u.`UserID`
and ua.`__Active` = 1
where ua.`_Email` = 'brian@stumpyinc.com'
and u.`Deleted`=0
order by u.`DateTimeAdded` desc
limit 50
)
order by `DateTimeAdded` desc
limit 50
需要这些:
u: INDEX(Email, Deleted, DateTimeAdded) -- date last
ua: INDEX(UserId, __Active) -- either order
ua: INDEX(_Email)
u: INDEX(UserID, Deleted)
(如果您遇到语法错误,请告诉我。如果太慢,请提供 EXPLAIN
。)
索引前缀(Email(191)
)通常没有用。摆脱它。这里有 5 种方法可以避免它:http://mysql.rjweb.org/doc.php/limits#767_limit_in_innodb_indexes
PK 是一个 UNIQUE 键,所以去掉第二个:
PRIMARY KEY (`UserID`),
UNIQUE KEY `UserID_UNIQUE` (`UserID`),
闻起来像 UUID;使用 ascii (ascii_general_ci),而不是 utf8mb4:
... char(36) COLLATE utf8mb4_unicode_ci
INT(1)
占用4个字节;使用 TINYINT
作为标志。