sqoop import 为正确的 sql 查询提供了错误的结果
sqoop import gives wrong result for a correct sql query
我在 MySQL
中使用如下查询。我得到了我想要的结果。
select TABLE_NAME,count(column_name) as no_of_columns from information_schema.columns where TABLE_SCHEMA = 'testing' and TABLE_NAME NOT REGEXP 'temp|bkup|RemoveMe|test' group by TABLE_NAME
当我在 sqoop 导入语句中使用相同的查询时,结果不同。
sqoop
导入语句如下。
sqoop import --connect jdbc:mysql://xxxxxx:3306/information_schema --username xxxxx --password-file /user/xxxxx/passwds/mysql.file --query "select TABLE_NAME,count(column_name) as no_of_columns from information_schema.columns where TABLE_SCHEMA = 'testing' and TABLE_NAME NOT REGEXP 'temp|bkup|RemoveMe|test' group by TABLE_NAME and $CONDITIONS" -m 1 --target-dir /user/hive/warehouse/xxxx.db/testing_columns --outdir /home/xxxxx/logs/outdir
为什么会这样,我应该怎么做才能得到想要的结果
$CONDITIONS
标记必须在 WHERE
子句中:
sqoop import --connect jdbc:mysql://xxxxxx:3306/information_schema \
--username xxxxx --password-file /user/xxxxx/passwds/mysql.file \
--query "select TABLE_NAME,count(column_name) as no_of_columns \
from information_schema.columns \
where TABLE_SCHEMA = 'testing' \
and TABLE_NAME NOT REGEXP 'temp|bkup|RemoveMe|test' \
and $CONDITIONS \
group by TABLE_NAME" \
-m 1 --target-dir /user/hive/warehouse/xxxx.db/testing_columns \
--outdir /home/xxxxx/logs/outdir
也根据Sqoop User Guide考虑:
The facility of using free-form query in the current version of Sqoop
is limited to simple queries where there are no ambiguous projections
and no OR
conditions in the WHERE
clause. Use of complex queries such
as queries that have sub-queries or joins leading to ambiguous
projections can lead to unexpected results.
我在 MySQL
中使用如下查询。我得到了我想要的结果。
select TABLE_NAME,count(column_name) as no_of_columns from information_schema.columns where TABLE_SCHEMA = 'testing' and TABLE_NAME NOT REGEXP 'temp|bkup|RemoveMe|test' group by TABLE_NAME
当我在 sqoop 导入语句中使用相同的查询时,结果不同。
sqoop
导入语句如下。
sqoop import --connect jdbc:mysql://xxxxxx:3306/information_schema --username xxxxx --password-file /user/xxxxx/passwds/mysql.file --query "select TABLE_NAME,count(column_name) as no_of_columns from information_schema.columns where TABLE_SCHEMA = 'testing' and TABLE_NAME NOT REGEXP 'temp|bkup|RemoveMe|test' group by TABLE_NAME and $CONDITIONS" -m 1 --target-dir /user/hive/warehouse/xxxx.db/testing_columns --outdir /home/xxxxx/logs/outdir
为什么会这样,我应该怎么做才能得到想要的结果
$CONDITIONS
标记必须在 WHERE
子句中:
sqoop import --connect jdbc:mysql://xxxxxx:3306/information_schema \
--username xxxxx --password-file /user/xxxxx/passwds/mysql.file \
--query "select TABLE_NAME,count(column_name) as no_of_columns \
from information_schema.columns \
where TABLE_SCHEMA = 'testing' \
and TABLE_NAME NOT REGEXP 'temp|bkup|RemoveMe|test' \
and $CONDITIONS \
group by TABLE_NAME" \
-m 1 --target-dir /user/hive/warehouse/xxxx.db/testing_columns \
--outdir /home/xxxxx/logs/outdir
也根据Sqoop User Guide考虑:
The facility of using free-form query in the current version of Sqoop is limited to simple queries where there are no ambiguous projections and no
OR
conditions in theWHERE
clause. Use of complex queries such as queries that have sub-queries or joins leading to ambiguous projections can lead to unexpected results.