统计字符串中第一个、第二个和第三个单词的字符数
Count the number of characters of the first, second and third word in a string
我需要了解开发一个可以计算字符串中第二个和第三个单词的字符数的代码。
我得到了这个代码,但它只适用于第一个单词的字符数。
现在我只能使用 Spark SQL 或 dplyr 包。
这是我为了计算第一个单词中的字符数而做的
INSTR(NAME_NORM_LONG,' ')-1)
预期的结果是统计字符数并在新的列中显示结果。
word="hey I am Scott"
characters_word1 | characters_word2 | characters_word3
3 1 2
现在我运行这个测试代码(试图定位第二个词):
test_query<-test_query %>%
mutate(Total_char=nchar(NAME_NORM_LONG))%>%
mutate(Name_has_numbers=str_detect(NAME_NORM_LONG,"[[:digit:]]"))%>%
mutate(number_words=LENGTH(NAME_NORM_LONG) - LENGTH(REPLACE(NAME_NORM_LONG, ' ', '')) + 1)%>%
mutate(number_chars_w1=INSTR(NAME_NORM_LONG,' ')-1)%>%
mutate(number_chars_w2=substr(NAME_NORM_LONG,number_chars_w1+1,LENGTH(NAME_NORM_LONG)))``` and the result I am having is this one ```test_query
# Source: spark<?> [?? x 7]
PBIN0 NAME_NORM_LONG Total_char Name_has_numbers number_words number_chars_w1
<int> <chr> <int> <lgl> <dbl> <dbl>
1 4.01e8 GM BUILDERS 11 FALSE 2 2
# … with 1 more variable: number_chars_w2 <chr>
Warning messages:
1: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion
2: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion
3: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion
4: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion
5: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion```
使用str_split()
怎么样?
word="hey I am Scott"
word_list = stringr::str_split(word, " ")
n = length(word_list[[1]])
for (i in 1:n){
first_row = paste0("characters_word", 1:n)
second_row = sapply(word_list[[1]], nchar)
}
df = data.frame(first_row, second_row)
我需要了解开发一个可以计算字符串中第二个和第三个单词的字符数的代码。
我得到了这个代码,但它只适用于第一个单词的字符数。
现在我只能使用 Spark SQL 或 dplyr 包。
这是我为了计算第一个单词中的字符数而做的
INSTR(NAME_NORM_LONG,' ')-1)
预期的结果是统计字符数并在新的列中显示结果。
word="hey I am Scott"
characters_word1 | characters_word2 | characters_word3
3 1 2
现在我运行这个测试代码(试图定位第二个词):
test_query<-test_query %>%
mutate(Total_char=nchar(NAME_NORM_LONG))%>%
mutate(Name_has_numbers=str_detect(NAME_NORM_LONG,"[[:digit:]]"))%>%
mutate(number_words=LENGTH(NAME_NORM_LONG) - LENGTH(REPLACE(NAME_NORM_LONG, ' ', '')) + 1)%>%
mutate(number_chars_w1=INSTR(NAME_NORM_LONG,' ')-1)%>%
mutate(number_chars_w2=substr(NAME_NORM_LONG,number_chars_w1+1,LENGTH(NAME_NORM_LONG)))``` and the result I am having is this one ```test_query
# Source: spark<?> [?? x 7]
PBIN0 NAME_NORM_LONG Total_char Name_has_numbers number_words number_chars_w1
<int> <chr> <int> <lgl> <dbl> <dbl>
1 4.01e8 GM BUILDERS 11 FALSE 2 2
# … with 1 more variable: number_chars_w2 <chr>
Warning messages:
1: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion
2: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion
3: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion
4: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion
5: In substr(NAME_NORM_LONG, number_chars_w1, 1) :
NAs introduced by coercion```
使用str_split()
怎么样?
word="hey I am Scott"
word_list = stringr::str_split(word, " ")
n = length(word_list[[1]])
for (i in 1:n){
first_row = paste0("characters_word", 1:n)
second_row = sapply(word_list[[1]], nchar)
}
df = data.frame(first_row, second_row)