具有最后一个字符串值的新列,在 R 中带有符号
New column with value of last string with a sign in R
我有一个如下所示的数据集:
example2
# A tibble: 11 x 2
municipality_name municipality_code
<chr> <chr>
1 - Zürich ZH
2 >> Bezirk Affoltern 000101
3 ......Aeugst am Albis 0001
4 ......Affoltern am Albis 0002
5 ......Bonstetten 0003
6 - Bern / Berne BE
7 >> Arrondissement administratif Jura bernois 000241
8 ......Corgémont 0431
9 ......Cormoret 0432
10 ......Cortébert 0433
11 ......Courtelary 0434
我想做的是创建四个新列:
一世。一个(canton
),即每行以“-”开头的最后一行,
二.另一个 (bezirk
) 是以“>>”开头的最后一行和
三.四.另外两个(canton_code
和 bezirk_code
)的值为 municipality_code
.
所以基本上,这个:
ideal
# A tibble: 11 x 6
municipality_name municipality_code canton bezirk canton_code bezirk_code
<chr> <chr> <chr> <chr> <chr> <chr>
1 - Zürich ZH "- Zürich" >> Bezirk Affoltern ZH 000101
2 >> Bezirk Affoltern 000101 "- Zürich" >> Bezirk Affoltern ZH 000101
3 ......Aeugst am Albis 0001 "- Zürich" >> Bezirk Affoltern ZH 000101
4 ......Affoltern am Albis 0002 "- Zürich" >> Bezirk Affoltern ZH 000101
5 ......Bonstetten 0003 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
6 - Bern / Berne BE "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
7 >> Arrondissement administratif Jura bernois 000241 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
8 ......Corgémont 0431 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
9 ......Cormoret 0432 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
10 ......Cortébert 0433 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
11 ......Courtelary 0434 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
我真的为此苦苦挣扎,我相信一定有更简单的方法来实现它,所以我希望这里有人知道如何获得它。
非常感谢!
一种方法是搜索所需的分隔符并用它创建一个列,然后用最后一个非空值填充空值。
下面是一个如何做到这一点的例子;
# Loading required libraries
library(dplyr)
library(stringr)
library(tidyr)
# Creating sample data
example <- read.table(text ='
- Zürich|ZH
>> Bezirk Affoltern|000101
......Aeugst am Albis|0001
......Affoltern am Albis|0002
......Bonstetten|0003
- Bern / Berne|BE
>> Arrondissement administratif Jura bernois|000241
......Corgémont|0431
......Cormoret|0432
......Cortébert|0433
......Courtelary|0434',header = FALSE, stringsAsFactors = FALSE, sep = "|",
col.names = c("municipality_name", "municipality_code"))
example %>%
# If column starts with required delimiter then include the required column else null
mutate(canton = ifelse(str_starts(municipality_name, "-"), municipality_name, NA_character_),
bezirk = ifelse(str_starts(municipality_name, ">>"), municipality_name, NA_character_),
canton_code = ifelse(str_starts(municipality_name, "-"), municipality_code, NA_character_),
bezirk_code = ifelse(str_starts(municipality_name, ">>"), municipality_code, NA_character_)) %>%
# Use tidyr fill function to fill value with last non null value
fill(all_of(c("canton", "bezirk", "canton_code", "bezirk_code")), .direction = "down")
我有一个如下所示的数据集:
example2
# A tibble: 11 x 2
municipality_name municipality_code
<chr> <chr>
1 - Zürich ZH
2 >> Bezirk Affoltern 000101
3 ......Aeugst am Albis 0001
4 ......Affoltern am Albis 0002
5 ......Bonstetten 0003
6 - Bern / Berne BE
7 >> Arrondissement administratif Jura bernois 000241
8 ......Corgémont 0431
9 ......Cormoret 0432
10 ......Cortébert 0433
11 ......Courtelary 0434
我想做的是创建四个新列:
一世。一个(canton
),即每行以“-”开头的最后一行,
二.另一个 (bezirk
) 是以“>>”开头的最后一行和
三.四.另外两个(canton_code
和 bezirk_code
)的值为 municipality_code
.
所以基本上,这个:
ideal
# A tibble: 11 x 6
municipality_name municipality_code canton bezirk canton_code bezirk_code
<chr> <chr> <chr> <chr> <chr> <chr>
1 - Zürich ZH "- Zürich" >> Bezirk Affoltern ZH 000101
2 >> Bezirk Affoltern 000101 "- Zürich" >> Bezirk Affoltern ZH 000101
3 ......Aeugst am Albis 0001 "- Zürich" >> Bezirk Affoltern ZH 000101
4 ......Affoltern am Albis 0002 "- Zürich" >> Bezirk Affoltern ZH 000101
5 ......Bonstetten 0003 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
6 - Bern / Berne BE "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
7 >> Arrondissement administratif Jura bernois 000241 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
8 ......Corgémont 0431 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
9 ......Cormoret 0432 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
10 ......Cortébert 0433 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
11 ......Courtelary 0434 "- Bern / Berne " >> Arrondissement administratif Jura bernois BE 000241
我真的为此苦苦挣扎,我相信一定有更简单的方法来实现它,所以我希望这里有人知道如何获得它。
非常感谢!
一种方法是搜索所需的分隔符并用它创建一个列,然后用最后一个非空值填充空值。
下面是一个如何做到这一点的例子;
# Loading required libraries
library(dplyr)
library(stringr)
library(tidyr)
# Creating sample data
example <- read.table(text ='
- Zürich|ZH
>> Bezirk Affoltern|000101
......Aeugst am Albis|0001
......Affoltern am Albis|0002
......Bonstetten|0003
- Bern / Berne|BE
>> Arrondissement administratif Jura bernois|000241
......Corgémont|0431
......Cormoret|0432
......Cortébert|0433
......Courtelary|0434',header = FALSE, stringsAsFactors = FALSE, sep = "|",
col.names = c("municipality_name", "municipality_code"))
example %>%
# If column starts with required delimiter then include the required column else null
mutate(canton = ifelse(str_starts(municipality_name, "-"), municipality_name, NA_character_),
bezirk = ifelse(str_starts(municipality_name, ">>"), municipality_name, NA_character_),
canton_code = ifelse(str_starts(municipality_name, "-"), municipality_code, NA_character_),
bezirk_code = ifelse(str_starts(municipality_name, ">>"), municipality_code, NA_character_)) %>%
# Use tidyr fill function to fill value with last non null value
fill(all_of(c("canton", "bezirk", "canton_code", "bezirk_code")), .direction = "down")