SQL Spark Java 拆分连接的字符串

SQL Spark Java split a concatenated string

我正在尝试 selectExpr split 函数,但我的 table 看起来像这样:

+--------------------+--------------------+--------------------+
|              genres|              genres|             genres1|
+--------------------+--------------------+--------------------+
|Adventure|Animati...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
|Adventure|Childre...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
|      Comedy|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|Comedy|Drama|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|              Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
|Action|Crime|Thri...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
|      Comedy|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|  Adventure|Children|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
|              Action|[A, c, t, i, o, n, ]|[A, c, t, i, o, n, ]|
|Action|Adventure|...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
|Comedy|Drama|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|       Comedy|Horror|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|Adventure|Animati...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
|               Drama|   [D, r, a, m, a, ]|   [D, r, a, m, a, ]|
|Action|Adventure|...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
|         Crime|Drama|[C, r, i, m, e, |...|[C, r, i, m, e, |...|
|       Drama|Romance|[D, r, a, m, a, |...|[D, r, a, m, a, |...|
|              Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
|              Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
|Action|Comedy|Cri...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
+--------------------+--------------------+--------------------+

我的代码是:

movies.selectExpr("genres", "split (genres, '\|') as genres","split (genres, '\|') as genres1").show();

单词应该是全长的,而不是用逗号逐个字母地分割。

你逃得还不够。请注意,您的代码会根据空模式的替代方案进行拆分:

scala> spark.range(1).selectExpr("split('Action|Comedy|Drama', '\|')").show(false)
+-----------------------------------------------------------+
|split(Action|Comedy|Drama, |)                              |
+-----------------------------------------------------------+
|[A, c, t, i, o, n, |, C, o, m, e, d, y, |, D, r, a, m, a, ]|
+-----------------------------------------------------------+

当你需要时:

scala> spark.range(1).selectExpr("split('Action|Comedy|Drama', '\\|')").show(false)
+------------------------------+
|split(Action|Comedy|Drama, \|)|
+------------------------------+
|[Action, Comedy, Drama]       |
+------------------------------+

细微但重要的区别。