用作边界查询时有效 MySQL 查询中断
Valid MySQL query breaks when used as boundary-query
注意:这不是
的副本
为了限制只获取最近 8 天的数据,我在 boundary-query
和 Sqoop
之后使用了这个
SELECT min(`created_at`),
max(`created_at`)
FROM `billing_db`.`billing_ledger`
WHERE `created_at` >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY)"
为了便于阅读,我在这里将查询分成多行,实际上我只在单行中将它传递给 Sqoop
边界查询不同部分的解释是
IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone)
- 确定服务器时区
- 适用于 MySQL 和 TiDB
convert_tz(now(), <server-timezone>,'Asia/Kolkata')
- 从 IST 中的服务器时区转换时间
timestamp(date(<ist-timestamp> + interval -{num_days} DAY)
- returns 00:00 小时的 IST 时间戳是今天 {num_days} 之前的日期(当前时间 -> tz 特定)
虽然查询在 MySQL
上运行良好
mysql> SELECT min(`created_at`),
-> max(`created_at`)
-> FROM `billing_db`.`billing_ledger`
-> WHERE `created_at` >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY);
+---------------------+---------------------+
| min(`created_at`) | max(`created_at`) |
+---------------------+---------------------+
| 2020-05-08 00:00:00 | 2020-05-10 20:12:32 |
+---------------------+---------------------+
1 row in set (0.02 sec)
它在 Sqoop 上使用以下堆栈跟踪中断
INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT min(), max() FROM . WHERE >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY)
[2020-05-10 12:45:34,968] {ssh_utils.py:130} WARNING - 20/05/10 18:15:34 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1589114450995_0001
[2020-05-10 12:45:34,971] {ssh_utils.py:130} WARNING - 20/05/10 18:15:34 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@6ab7a896
[2020-05-10 12:45:34,973] {ssh_utils.py:130} WARNING - 20/05/10 18:15:34 ERROR tool.ImportTool: Import failed: java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=313686) You have an error in your SQL syntax; check the manual that corresponds to your TiDB version for the right syntax to use line 1 column 12 near "), max() FROM . WHERE >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY)"
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:207)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
备案
在 --query
(自由格式查询导入)中需要使用 WHERE $CONDITIONS
,但对于 --boundary-query
则不是强制性的。没有它,Sqoop 只会生成此警告
WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: SELECT min(), max() FROM . WHERE >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY); splits may not partition data.
我一直在管道中的其他地方使用类似的复杂 boundary-query
s,但在这种特殊情况下它正在中断
我尝试了什么
我试过在这样的查询的 SELECT
子句中添加 aliases
SELECT min(`created_at`) AS min_created_at,...
反引号 ``
是罪魁祸首
从边界查询中删除反引号解决了错误
- 一些 comments 在讨论中指出反引号会导致
sqoop
出现奇怪的事情
- 但是 docs bear no mention of it and some discussions 甚至鼓励使用它
注意:这不是
为了限制只获取最近 8 天的数据,我在 boundary-query
和 Sqoop
SELECT min(`created_at`),
max(`created_at`)
FROM `billing_db`.`billing_ledger`
WHERE `created_at` >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY)"
为了便于阅读,我在这里将查询分成多行,实际上我只在单行中将它传递给 Sqoop
边界查询不同部分的解释是
IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone)
- 确定服务器时区
- 适用于 MySQL 和 TiDB
convert_tz(now(), <server-timezone>,'Asia/Kolkata')
- 从 IST 中的服务器时区转换时间
timestamp(date(<ist-timestamp> + interval -{num_days} DAY)
- returns 00:00 小时的 IST 时间戳是今天 {num_days} 之前的日期(当前时间 -> tz 特定)
虽然查询在 MySQL
上运行良好mysql> SELECT min(`created_at`),
-> max(`created_at`)
-> FROM `billing_db`.`billing_ledger`
-> WHERE `created_at` >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY);
+---------------------+---------------------+
| min(`created_at`) | max(`created_at`) |
+---------------------+---------------------+
| 2020-05-08 00:00:00 | 2020-05-10 20:12:32 |
+---------------------+---------------------+
1 row in set (0.02 sec)
它在 Sqoop 上使用以下堆栈跟踪中断
INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT min(), max() FROM . WHERE >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY)
[2020-05-10 12:45:34,968] {ssh_utils.py:130} WARNING - 20/05/10 18:15:34 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1589114450995_0001
[2020-05-10 12:45:34,971] {ssh_utils.py:130} WARNING - 20/05/10 18:15:34 DEBUG util.ClassLoaderStack: Restoring classloader: sun.misc.Launcher$AppClassLoader@6ab7a896
[2020-05-10 12:45:34,973] {ssh_utils.py:130} WARNING - 20/05/10 18:15:34 ERROR tool.ImportTool: Import failed: java.io.IOException: java.sql.SQLSyntaxErrorException: (conn=313686) You have an error in your SQL syntax; check the manual that corresponds to your TiDB version for the right syntax to use line 1 column 12 near "), max() FROM . WHERE >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY)"
at org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat.getSplits(DataDrivenDBInputFormat.java:207)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:303)
备案
在
--query
(自由格式查询导入)中需要使用WHERE $CONDITIONS
,但对于--boundary-query
则不是强制性的。没有它,Sqoop 只会生成此警告WARN db.DataDrivenDBInputFormat: Could not find $CONDITIONS token in query: SELECT min(), max() FROM . WHERE >= timestamp(date(convert_tz(now(), IF(@@global.time_zone = 'SYSTEM', @@system_time_zone, @@global.time_zone),'Asia/Kolkata')) + interval -2 DAY); splits may not partition data.
我一直在管道中的其他地方使用类似的复杂
boundary-query
s,但在这种特殊情况下它正在中断
我尝试了什么
我试过在这样的查询的
SELECT
子句中添加 aliasesSELECT min(`created_at`) AS min_created_at,...
反引号 ``
是罪魁祸首
从边界查询中删除反引号解决了错误
- 一些 comments 在讨论中指出反引号会导致
sqoop
出现奇怪的事情
- 但是 docs bear no mention of it and some discussions 甚至鼓励使用它