SQL Spark Java 拆分连接的字符串
SQL Spark Java split a concatenated string
我正在尝试 selectExpr
split
函数,但我的 table 看起来像这样:
+--------------------+--------------------+--------------------+
| genres| genres| genres1|
+--------------------+--------------------+--------------------+
|Adventure|Animati...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
|Adventure|Childre...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
| Comedy|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|Comedy|Drama|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
| Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
|Action|Crime|Thri...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
| Comedy|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
| Adventure|Children|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
| Action|[A, c, t, i, o, n, ]|[A, c, t, i, o, n, ]|
|Action|Adventure|...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
|Comedy|Drama|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
| Comedy|Horror|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|Adventure|Animati...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
| Drama| [D, r, a, m, a, ]| [D, r, a, m, a, ]|
|Action|Adventure|...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
| Crime|Drama|[C, r, i, m, e, |...|[C, r, i, m, e, |...|
| Drama|Romance|[D, r, a, m, a, |...|[D, r, a, m, a, |...|
| Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
| Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
|Action|Comedy|Cri...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
+--------------------+--------------------+--------------------+
我的代码是:
movies.selectExpr("genres", "split (genres, '\|') as genres","split (genres, '\|') as genres1").show();
单词应该是全长的,而不是用逗号逐个字母地分割。
你逃得还不够。请注意,您的代码会根据空模式的替代方案进行拆分:
scala> spark.range(1).selectExpr("split('Action|Comedy|Drama', '\|')").show(false)
+-----------------------------------------------------------+
|split(Action|Comedy|Drama, |) |
+-----------------------------------------------------------+
|[A, c, t, i, o, n, |, C, o, m, e, d, y, |, D, r, a, m, a, ]|
+-----------------------------------------------------------+
当你需要时:
scala> spark.range(1).selectExpr("split('Action|Comedy|Drama', '\\|')").show(false)
+------------------------------+
|split(Action|Comedy|Drama, \|)|
+------------------------------+
|[Action, Comedy, Drama] |
+------------------------------+
细微但重要的区别。
我正在尝试 selectExpr
split
函数,但我的 table 看起来像这样:
+--------------------+--------------------+--------------------+
| genres| genres| genres1|
+--------------------+--------------------+--------------------+
|Adventure|Animati...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
|Adventure|Childre...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
| Comedy|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|Comedy|Drama|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
| Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
|Action|Crime|Thri...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
| Comedy|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
| Adventure|Children|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
| Action|[A, c, t, i, o, n, ]|[A, c, t, i, o, n, ]|
|Action|Adventure|...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
|Comedy|Drama|Romance|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
| Comedy|Horror|[C, o, m, e, d, y...|[C, o, m, e, d, y...|
|Adventure|Animati...|[A, d, v, e, n, t...|[A, d, v, e, n, t...|
| Drama| [D, r, a, m, a, ]| [D, r, a, m, a, ]|
|Action|Adventure|...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
| Crime|Drama|[C, r, i, m, e, |...|[C, r, i, m, e, |...|
| Drama|Romance|[D, r, a, m, a, |...|[D, r, a, m, a, |...|
| Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
| Comedy|[C, o, m, e, d, y, ]|[C, o, m, e, d, y, ]|
|Action|Comedy|Cri...|[A, c, t, i, o, n...|[A, c, t, i, o, n...|
+--------------------+--------------------+--------------------+
我的代码是:
movies.selectExpr("genres", "split (genres, '\|') as genres","split (genres, '\|') as genres1").show();
单词应该是全长的,而不是用逗号逐个字母地分割。
你逃得还不够。请注意,您的代码会根据空模式的替代方案进行拆分:
scala> spark.range(1).selectExpr("split('Action|Comedy|Drama', '\|')").show(false)
+-----------------------------------------------------------+
|split(Action|Comedy|Drama, |) |
+-----------------------------------------------------------+
|[A, c, t, i, o, n, |, C, o, m, e, d, y, |, D, r, a, m, a, ]|
+-----------------------------------------------------------+
当你需要时:
scala> spark.range(1).selectExpr("split('Action|Comedy|Drama', '\\|')").show(false)
+------------------------------+
|split(Action|Comedy|Drama, \|)|
+------------------------------+
|[Action, Comedy, Drama] |
+------------------------------+
细微但重要的区别。