如何从 sparklyr 中的字符串中删除'\'
How to remove '\' from a string in sparklyr
我正在使用 sparklyr
并且有一个 spark 数据框,其列 word
包含单词,其中一些包含我想要删除的特殊字符。我在特殊字符前使用 regepx_replace
和 \\
是成功的,就像这样:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\\(', '')) %>%
mutate(word = regexp_replace(word, '\\)', '')) %>%
mutate(word = regexp_replace(word, '\\+', '')) %>%
mutate(word = regexp_replace(word, '\\?', '')) %>%
mutate(word = regexp_replace(word, '\\:', '')) %>%
mutate(word = regexp_replace(word, '\\;', '')) %>%
mutate(word = regexp_replace(word, '\\!', ''))
现在我想删除 \
。我都试过了:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\\\', ''))
和:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\', ''))
但两者都行不通...
您必须为 R 端和 Java 端转义更正代码,因此您实际上需要的是 "\\\\"
:
df <- copy_to(sc, tibble(word = "(abc\zyx: 1)"))
df %>% mutate(regexp_replace(word, "\\\\", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "\\\\\\\\", "")`
<chr> <chr>
1 "(abc\zyx:1)" (abczyx: 1)
根据您的具体要求,一次匹配所有字符可能更容易。例如,您可以只保留单词字符 (\w
) 和空格 (\s
):
df %>% mutate(regexp_replace(word, "[^\\w+\\s+]", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "[^\\\\w+\\\\s+]", "")`
<chr> <chr>
1 "(abc\zyx: 1)" abczyx 1
或仅限单词字符
df %>% mutate(regexp_replace(word, "[^\\w+]", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "[^\\\\w+]", "")`
<chr> <chr>
1 "(abc\zyx: 1)" abczyx1
我正在使用 sparklyr
并且有一个 spark 数据框,其列 word
包含单词,其中一些包含我想要删除的特殊字符。我在特殊字符前使用 regepx_replace
和 \\
是成功的,就像这样:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\\(', '')) %>%
mutate(word = regexp_replace(word, '\\)', '')) %>%
mutate(word = regexp_replace(word, '\\+', '')) %>%
mutate(word = regexp_replace(word, '\\?', '')) %>%
mutate(word = regexp_replace(word, '\\:', '')) %>%
mutate(word = regexp_replace(word, '\\;', '')) %>%
mutate(word = regexp_replace(word, '\\!', ''))
现在我想删除 \
。我都试过了:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\\\', ''))
和:
words.sdf <- words.sdf %>%
mutate(word = regexp_replace(word, '\', ''))
但两者都行不通...
您必须为 R 端和 Java 端转义更正代码,因此您实际上需要的是 "\\\\"
:
df <- copy_to(sc, tibble(word = "(abc\zyx: 1)"))
df %>% mutate(regexp_replace(word, "\\\\", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "\\\\\\\\", "")`
<chr> <chr>
1 "(abc\zyx:1)" (abczyx: 1)
根据您的具体要求,一次匹配所有字符可能更容易。例如,您可以只保留单词字符 (\w
) 和空格 (\s
):
df %>% mutate(regexp_replace(word, "[^\\w+\\s+]", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "[^\\\\w+\\\\s+]", "")`
<chr> <chr>
1 "(abc\zyx: 1)" abczyx 1
或仅限单词字符
df %>% mutate(regexp_replace(word, "[^\\w+]", ""))
# Source: lazy query [?? x 2]
# Database: spark_shell_connection
word `regexp_replace(word, "[^\\\\w+]", "")`
<chr> <chr>
1 "(abc\zyx: 1)" abczyx1