Spark Hive - 具有 window 函数的 UDFArgumentTypeException?
Spark Hive - UDFArgumentTypeException with window function?
我有以下 df:
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 1570.0000|2015-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
我正在尝试 运行 一个 window 函数作为:
WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("eventdate").desc());
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
.filter("rank <= 2")
.show();
我想得到的是每个用户的最后两个条目(最后一个是最新日期,但由于它是按降序排列的,前两行):
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
但我明白了:
+------------+----------------------+-------------------+----+
|increment_id|base_subtotal_incl_tax| eventdate|rank|
+------------+----------------------+-------------------+----+
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
+------------+----------------------+-------------------+----+
我错过了什么?
[OLD] - 原来是我出错了,现在解决了:
WindowSpec window = Window.partitionBy(df.col("id"));
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
.filter("rank <= 2")
.show();
但是 returns 上面用注释标记的行的错误 Exception in thread "main" org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or
more arguments are expected.
。我错过了什么?这个错误是什么意思?谢谢!
rank
window 函数需要 window 和 orderBy
,子句例如:
WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("payment"));
如果没有订单,它就毫无意义,因此会出现错误。
我有以下 df:
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 14470.0000|2015-07-14 09:54:12|
| 1086| 1570.0000|2015-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
我正在尝试 运行 一个 window 函数作为:
WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("eventdate").desc());
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
.filter("rank <= 2")
.show();
我想得到的是每个用户的最后两个条目(最后一个是最新日期,但由于它是按降序排列的,前两行):
+------------+----------------------+-------------------+
|increment_id|base_subtotal_incl_tax| eventdate|
+------------+----------------------+-------------------+
| 1086| 14470.0000|2016-06-14 09:54:12|
| 1086| 14470.0000|2016-06-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
| 5555| 14470.0000|2014-07-14 09:54:12|
+------------+----------------------+-------------------+
但我明白了:
+------------+----------------------+-------------------+----+
|increment_id|base_subtotal_incl_tax| eventdate|rank|
+------------+----------------------+-------------------+----+
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 5555| 14470.0000|2014-07-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
| 1086| 14470.0000|2016-06-14 09:54:12| 1|
+------------+----------------------+-------------------+----+
我错过了什么?
[OLD] - 原来是我出错了,现在解决了:
WindowSpec window = Window.partitionBy(df.col("id"));
df.select(df.col("*"),rank().over(window).alias("rank")) //error for this line
.filter("rank <= 2")
.show();
但是 returns 上面用注释标记的行的错误 Exception in thread "main" org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException: One or
more arguments are expected.
。我错过了什么?这个错误是什么意思?谢谢!
rank
window 函数需要 window 和 orderBy
,子句例如:
WindowSpec window = Window.partitionBy(df.col("id")).orderBy(df.col("payment"));
如果没有订单,它就毫无意义,因此会出现错误。