为什么来自 Datagrip 的相同 SQL 查询和嵌入 Python 代码中的 SQL 不同有输出?
Why a same SQL query from Datagrip and embeded in Python code in SQL different have outputs?
无论查询是从纯 SQL 在 Datagrip 上进行的,还是在 Jupyter 上进行的 SQL 查询嵌套在某些 Python 代码中,我都会得到两个不同的输出。
查询是来自几组用户的 COUNT 个特定状态。
以下是此案例的用户组:
ids
grupos
0 [160, 161, 365, 386, 471]
1 [296, 306]
我的数据库中的table是这样的:
代码:
来自 Jupyter :
for i, ids in enumerate(res['ids']):
cur.execute("""SELECT COUNT(swipe.eclipse_id),
subscriber_hashtag.hashtag_id
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id in %s
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;""",
(tuple(res.iloc[i]['ids']),))
n = cur.fetchall()
listado = [{"count": elem[0], "eclipse_id": elem[1]} for elem in n]
来自 Datagrip :
我必须分开查询,否则会混淆结果。
SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id FROM subscriber_hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id = 160 OR subscriber_hashtag.subscriber_id = 161 OR subscriber_hashtag.subscriber_id = 365 OR subscriber_hashtag.subscriber_id = 386 OR subscriber_hashtag.subscriber_id = 471 OR subscriber_hashtag.subscriber_id = 499
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;
然后
SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id FROM subscriber_hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id = 296 OR subscriber_hashtag.subscriber_id = 306
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;
输出:
来自 Jupyter :
[(1500L, 996), (1185L, 592), (480L, 1214), (432L, 329), (375L, 398), (306L, 357), (300L, 473), (288L, 325), (225L, 322), (207L, 321), (207L, 1385), (195L, 1445), (180L, 1049), (108L, 334), (105L, 1183), (90L, 387), (81L, 324), (75L, 617), (72L, 379), (63L, 1331), (54L, 2546), (54L, 2545), (48L, 961), (48L, 962), (45L, 1382), (30L, 1432), (30L, 1429), (27L, 1334), (24L, 1128), (18L, 1376), (18L, 386), (18L, 1345), (18L, 1335), (9L, 1354), (9L, 1356), (9L, 1355), (9L, 1357), (9L, 1361), (9L, 1364), (9L, 1374), (9L, 1375), (9L, 1373)]
来自 Datagrip :
你可以看到这两种输出之间没有任何共同点:
[160, 161, 365, 386, 471]
上的第一个查询:
3000 397
2967 321
2352 329
2233 960
2000 392
1975 685
1896 337
1536 529
637 328
553 704
240 2545
240 2546
237 652
196 758
196 573
147 483
98 584
98 450
98 448
79 2549
79 2554
79 2552
79 2553
79 2551
79 2550
58 1376
56 428
49 451
49 759
49 449
49 760
34 2580
32 325
29 2547
29 425
25 322
13 594
12 334
9 427
6 323
3 347
3 595
3 345
1 521
1 333
[296, 306]
上的第二次查询:
6600 996
5214 592
2880 329
2112 1214
1920 325
1650 398
1500 322
1380 321
1380 1385
858 1445
792 1049
720 334
600 387
540 324
480 379
462 1183
420 1331
360 2546
360 2545
330 617
306 357
300 473
300 1382
180 1334
132 1432
132 1429
120 386
120 1335
120 1376
120 1345
60 1364
60 1374
60 1356
60 1357
60 1355
60 1361
60 1354
60 1375
60 1373
48 962
48 961
24 1128
本质上是 WHERE
子句中 AND
和 OR
混合的 SQL 逻辑问题。在您的 DataGrip 中,您需要将所有 OR
子句括在括号中,以复制包含一个 IN()
子句的 Python 版本。下面两个语句应该产生相同的结果:
DataGrip 已调整 SQL
SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag
ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe
ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 OR swipe.state=9) AND
(subscriber_hashtag.subscriber_id = 160 OR
subscriber_hashtag.subscriber_id = 161 OR
subscriber_hashtag.subscriber_id = 365 OR
subscriber_hashtag.subscriber_id = 386 OR
subscriber_hashtag.subscriber_id = 471 OR
subscriber_hashtag.subscriber_id = 499)
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;
Python 呈现 SQL
SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag
ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe
ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 OR swipe.state=9) AND
subscriber_hashtag.subscriber_id IN (160,161,365,386,471,499)
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;
无论查询是从纯 SQL 在 Datagrip 上进行的,还是在 Jupyter 上进行的 SQL 查询嵌套在某些 Python 代码中,我都会得到两个不同的输出。
查询是来自几组用户的 COUNT 个特定状态。
以下是此案例的用户组:
ids
grupos
0 [160, 161, 365, 386, 471]
1 [296, 306]
我的数据库中的table是这样的:
代码:
来自 Jupyter :
for i, ids in enumerate(res['ids']):
cur.execute("""SELECT COUNT(swipe.eclipse_id),
subscriber_hashtag.hashtag_id
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id in %s
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;""",
(tuple(res.iloc[i]['ids']),))
n = cur.fetchall()
listado = [{"count": elem[0], "eclipse_id": elem[1]} for elem in n]
来自 Datagrip :
我必须分开查询,否则会混淆结果。
SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id FROM subscriber_hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id = 160 OR subscriber_hashtag.subscriber_id = 161 OR subscriber_hashtag.subscriber_id = 365 OR subscriber_hashtag.subscriber_id = 386 OR subscriber_hashtag.subscriber_id = 471 OR subscriber_hashtag.subscriber_id = 499
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;
然后
SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id FROM subscriber_hashtag
INNER JOIN eclipse_hashtag ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 or swipe.state=9) AND subscriber_hashtag.subscriber_id = 296 OR subscriber_hashtag.subscriber_id = 306
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;
输出:
来自 Jupyter :
[(1500L, 996), (1185L, 592), (480L, 1214), (432L, 329), (375L, 398), (306L, 357), (300L, 473), (288L, 325), (225L, 322), (207L, 321), (207L, 1385), (195L, 1445), (180L, 1049), (108L, 334), (105L, 1183), (90L, 387), (81L, 324), (75L, 617), (72L, 379), (63L, 1331), (54L, 2546), (54L, 2545), (48L, 961), (48L, 962), (45L, 1382), (30L, 1432), (30L, 1429), (27L, 1334), (24L, 1128), (18L, 1376), (18L, 386), (18L, 1345), (18L, 1335), (9L, 1354), (9L, 1356), (9L, 1355), (9L, 1357), (9L, 1361), (9L, 1364), (9L, 1374), (9L, 1375), (9L, 1373)]
来自 Datagrip :
你可以看到这两种输出之间没有任何共同点:
[160, 161, 365, 386, 471]
上的第一个查询:
3000 397
2967 321
2352 329
2233 960
2000 392
1975 685
1896 337
1536 529
637 328
553 704
240 2545
240 2546
237 652
196 758
196 573
147 483
98 584
98 450
98 448
79 2549
79 2554
79 2552
79 2553
79 2551
79 2550
58 1376
56 428
49 451
49 759
49 449
49 760
34 2580
32 325
29 2547
29 425
25 322
13 594
12 334
9 427
6 323
3 347
3 595
3 345
1 521
1 333
[296, 306]
上的第二次查询:
6600 996
5214 592
2880 329
2112 1214
1920 325
1650 398
1500 322
1380 321
1380 1385
858 1445
792 1049
720 334
600 387
540 324
480 379
462 1183
420 1331
360 2546
360 2545
330 617
306 357
300 473
300 1382
180 1334
132 1432
132 1429
120 386
120 1335
120 1376
120 1345
60 1364
60 1374
60 1356
60 1357
60 1355
60 1361
60 1354
60 1375
60 1373
48 962
48 961
24 1128
本质上是 WHERE
子句中 AND
和 OR
混合的 SQL 逻辑问题。在您的 DataGrip 中,您需要将所有 OR
子句括在括号中,以复制包含一个 IN()
子句的 Python 版本。下面两个语句应该产生相同的结果:
DataGrip 已调整 SQL
SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag
ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe
ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 OR swipe.state=9) AND
(subscriber_hashtag.subscriber_id = 160 OR
subscriber_hashtag.subscriber_id = 161 OR
subscriber_hashtag.subscriber_id = 365 OR
subscriber_hashtag.subscriber_id = 386 OR
subscriber_hashtag.subscriber_id = 471 OR
subscriber_hashtag.subscriber_id = 499)
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;
Python 呈现 SQL
SELECT COUNT(swipe.eclipse_id), subscriber_hashtag.hashtag_id
FROM subscriber_hashtag
INNER JOIN eclipse_hashtag
ON eclipse_hashtag.hashtag_id = subscriber_hashtag.hashtag_id
LEFT OUTER JOIN swipe
ON subscriber_hashtag.subscriber_id = swipe.subscriber_id
WHERE (swipe.state= 3 OR swipe.state = 6 OR swipe.state=9) AND
subscriber_hashtag.subscriber_id IN (160,161,365,386,471,499)
GROUP BY subscriber_hashtag.hashtag_id
ORDER BY COUNT(swipe.eclipse_id) DESC;