具有最后一个字符串值的新列，在 R 中带有符号

Question

我有一个如下所示的数据集：

    example2
# A tibble: 11 x 2
   municipality_name                            municipality_code
   <chr>                                        <chr>            
 1 - Zürich                                     ZH               
 2 >> Bezirk Affoltern                          000101           
 3 ......Aeugst am Albis                        0001             
 4 ......Affoltern am Albis                     0002             
 5 ......Bonstetten                             0003             
 6 - Bern / Berne                               BE               
 7 >> Arrondissement administratif Jura bernois 000241           
 8 ......Corgémont                              0431             
 9 ......Cormoret                               0432             
10 ......Cortébert                              0433             
11 ......Courtelary                             0434

我想做的是创建四个新列：
一世。一个（canton），即每行以“-”开头的最后一行，
二.另一个 (bezirk) 是以“>>”开头的最后一行和
三.四.另外两个（canton_code 和 bezirk_code）的值为 municipality_code.

所以基本上，这个：

ideal
# A tibble: 11 x 6
   municipality_name                            municipality_code canton            bezirk                                       canton_code bezirk_code
   <chr>                                        <chr>             <chr>             <chr>                                        <chr>       <chr>      
 1 - Zürich                                     ZH                "- Zürich"        >> Bezirk Affoltern                          ZH          000101     
 2 >> Bezirk Affoltern                          000101            "- Zürich"        >> Bezirk Affoltern                          ZH          000101     
 3 ......Aeugst am Albis                        0001              "- Zürich"        >> Bezirk Affoltern                          ZH          000101     
 4 ......Affoltern am Albis                     0002              "- Zürich"        >> Bezirk Affoltern                          ZH          000101     
 5 ......Bonstetten                             0003              "- Bern / Berne " >> Arrondissement administratif Jura bernois BE          000241     
 6 - Bern / Berne                               BE                "- Bern / Berne " >> Arrondissement administratif Jura bernois BE          000241     
 7 >> Arrondissement administratif Jura bernois 000241            "- Bern / Berne " >> Arrondissement administratif Jura bernois BE          000241     
 8 ......Corgémont                              0431              "- Bern / Berne " >> Arrondissement administratif Jura bernois BE          000241     
 9 ......Cormoret                               0432              "- Bern / Berne " >> Arrondissement administratif Jura bernois BE          000241     
10 ......Cortébert                              0433              "- Bern / Berne " >> Arrondissement administratif Jura bernois BE          000241     
11 ......Courtelary                             0434              "- Bern / Berne " >> Arrondissement administratif Jura bernois BE          000241

我真的为此苦苦挣扎，我相信一定有更简单的方法来实现它，所以我希望这里有人知道如何获得它。

非常感谢！

Answer 1

一种方法是搜索所需的分隔符并用它创建一个列，然后用最后一个非空值填充空值。

下面是一个如何做到这一点的例子；

# Loading required libraries
library(dplyr)
library(stringr)
library(tidyr)

# Creating sample data
example <- read.table(text ='
- Zürich|ZH 
>> Bezirk Affoltern|000101 
......Aeugst am Albis|0001 
......Affoltern am Albis|0002 
......Bonstetten|0003 
- Bern / Berne|BE 
>> Arrondissement administratif Jura bernois|000241 
......Corgémont|0431 
......Cormoret|0432 
......Cortébert|0433 
......Courtelary|0434',header = FALSE, stringsAsFactors = FALSE, sep = "|", 
                      col.names = c("municipality_name", "municipality_code"))

example %>%
  # If column starts with required delimiter then include the required column else null
  mutate(canton = ifelse(str_starts(municipality_name, "-"), municipality_name, NA_character_),
         bezirk = ifelse(str_starts(municipality_name, ">>"), municipality_name, NA_character_),
         canton_code = ifelse(str_starts(municipality_name, "-"), municipality_code, NA_character_),
         bezirk_code = ifelse(str_starts(municipality_name, ">>"), municipality_code, NA_character_)) %>%
  # Use tidyr fill function to fill value with last non null value
  fill(all_of(c("canton", "bezirk", "canton_code", "bezirk_code")), .direction = "down")

具有最后一个字符串值的新列，在 R 中带有符号

New column with value of last string with a sign in R

r

tidy

stringr

dplyr

tidyverse