删除所有内容，直到在 R 中第一次出现括号“(”

Question

我有一个包含议会和郡名称的数据集，看起来有点像这样：

library(tidyverse)
example <- data.frame(LGA_formal = c("Moira (S)","Monash (C)","Moonee Valley (C)",             
                        "Moorabool (S)","Moreland (C)" ,"Mornington Peninsula (S)"), 
                        Median_age = c(34,34,56,78,88,99))

我想创建一个只有名称的新列，但想保留旧列，因此它看起来像这样：

example_desired <- data.frame(LGA_formal =c("Moira (S)","Monash (C)","Moonee Valley (C)",             
                        "Moorabool (S)","Moreland (C)" ,"Mornington Peninsula (S)"), 
                        Median_age = c(34,34,56,78,88,99),
LGA = c("Moira","Monash","Moonee Valley",             
                        "Moorabool","Moreland","Mornington Peninsula"))

我一直在尝试像这样删除第一个括号之前的所有内容，但出现错误

example_desired <- example %>%
  mutate(LGA = str_extract(LGA_formal, ".+?(?=))")) %>%
mutate(LGA = trimws(LGA))

但是这不起作用，我收到以下错误

Error: Problem with `mutate()` input `LGA`.
x Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN, context=`.+?(?=))`)
i Input `LGA` is `str_extract(LGA_formal, ".+?(?=))")`.

如何指定括号内的所有内容？

Answer 1

您可以使用 sub 和 *\(.* 来删除第一个 ( 之后的所有内容以及之前的空格。

example$LGA <- sub(" *\(.*", "", example$LGA_formal)
identical(example, example_desired) #test if desired is reached
#[1] TRUE

Answer 2

在stringr中你可以做这样的事情

example %>% mutate(LGA = str_remove_all(LGA_formal, ' \(.*'))
                LGA_formal Median_age                  LGA
1                Moira (S)         34                Moira
2               Monash (C)         34               Monash
3        Moonee Valley (C)         56        Moonee Valley
4            Moorabool (S)         78            Moorabool
5             Moreland (C)         88             Moreland
6 Mornington Peninsula (S)         99 Mornington Peninsula

Answer 3

另一种方法是提取您需要的内容：

transform(example, LGA = sub('(.*)\s\(.*', '\1', LGA_formal))

#                LGA_formal Median_age                  LGA
#1                Moira (S)         34                Moira
#2               Monash (C)         34               Monash
#3        Moonee Valley (C)         56        Moonee Valley
#4            Moorabool (S)         78            Moorabool
#5             Moreland (C)         88             Moreland
#6 Mornington Peninsula (S)         99 Mornington Peninsula

或在tidyverse中：

library(dplyr)
library(stringr)

example %>% mutate(LGA = str_extract(LGA_formal, '.*(?=\s\()'))

删除所有内容，直到在 R 中第一次出现括号“(”

Remove everything until the first occurrence of a bracket "(" in R

r

stringr