删除所有内容,直到在 R 中第一次出现括号“(”
Remove everything until the first occurrence of a bracket "(" in R
我有一个包含议会和郡名称的数据集,看起来有点像这样:
library(tidyverse)
example <- data.frame(LGA_formal = c("Moira (S)","Monash (C)","Moonee Valley (C)",
"Moorabool (S)","Moreland (C)" ,"Mornington Peninsula (S)"),
Median_age = c(34,34,56,78,88,99))
我想创建一个只有名称的新列,但想保留旧列,因此它看起来像这样:
example_desired <- data.frame(LGA_formal =c("Moira (S)","Monash (C)","Moonee Valley (C)",
"Moorabool (S)","Moreland (C)" ,"Mornington Peninsula (S)"),
Median_age = c(34,34,56,78,88,99),
LGA = c("Moira","Monash","Moonee Valley",
"Moorabool","Moreland","Mornington Peninsula"))
我一直在尝试像这样删除第一个括号之前的所有内容,但出现错误
example_desired <- example %>%
mutate(LGA = str_extract(LGA_formal, ".+?(?=))")) %>%
mutate(LGA = trimws(LGA))
但是这不起作用,我收到以下错误
Error: Problem with `mutate()` input `LGA`.
x Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN, context=`.+?(?=))`)
i Input `LGA` is `str_extract(LGA_formal, ".+?(?=))")`.
如何指定括号内的所有内容?
您可以使用 sub
和 *\(.*
来删除第一个 (
之后的所有内容以及之前的空格。
example$LGA <- sub(" *\(.*", "", example$LGA_formal)
identical(example, example_desired) #test if desired is reached
#[1] TRUE
在stringr
中你可以做这样的事情
example %>% mutate(LGA = str_remove_all(LGA_formal, ' \(.*'))
LGA_formal Median_age LGA
1 Moira (S) 34 Moira
2 Monash (C) 34 Monash
3 Moonee Valley (C) 56 Moonee Valley
4 Moorabool (S) 78 Moorabool
5 Moreland (C) 88 Moreland
6 Mornington Peninsula (S) 99 Mornington Peninsula
另一种方法是提取您需要的内容:
transform(example, LGA = sub('(.*)\s\(.*', '\1', LGA_formal))
# LGA_formal Median_age LGA
#1 Moira (S) 34 Moira
#2 Monash (C) 34 Monash
#3 Moonee Valley (C) 56 Moonee Valley
#4 Moorabool (S) 78 Moorabool
#5 Moreland (C) 88 Moreland
#6 Mornington Peninsula (S) 99 Mornington Peninsula
或在tidyverse
中:
library(dplyr)
library(stringr)
example %>% mutate(LGA = str_extract(LGA_formal, '.*(?=\s\()'))
我有一个包含议会和郡名称的数据集,看起来有点像这样:
library(tidyverse)
example <- data.frame(LGA_formal = c("Moira (S)","Monash (C)","Moonee Valley (C)",
"Moorabool (S)","Moreland (C)" ,"Mornington Peninsula (S)"),
Median_age = c(34,34,56,78,88,99))
我想创建一个只有名称的新列,但想保留旧列,因此它看起来像这样:
example_desired <- data.frame(LGA_formal =c("Moira (S)","Monash (C)","Moonee Valley (C)",
"Moorabool (S)","Moreland (C)" ,"Mornington Peninsula (S)"),
Median_age = c(34,34,56,78,88,99),
LGA = c("Moira","Monash","Moonee Valley",
"Moorabool","Moreland","Mornington Peninsula"))
我一直在尝试像这样删除第一个括号之前的所有内容,但出现错误
example_desired <- example %>%
mutate(LGA = str_extract(LGA_formal, ".+?(?=))")) %>%
mutate(LGA = trimws(LGA))
但是这不起作用,我收到以下错误
Error: Problem with `mutate()` input `LGA`.
x Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN, context=`.+?(?=))`)
i Input `LGA` is `str_extract(LGA_formal, ".+?(?=))")`.
如何指定括号内的所有内容?
您可以使用 sub
和 *\(.*
来删除第一个 (
之后的所有内容以及之前的空格。
example$LGA <- sub(" *\(.*", "", example$LGA_formal)
identical(example, example_desired) #test if desired is reached
#[1] TRUE
在stringr
中你可以做这样的事情
example %>% mutate(LGA = str_remove_all(LGA_formal, ' \(.*'))
LGA_formal Median_age LGA
1 Moira (S) 34 Moira
2 Monash (C) 34 Monash
3 Moonee Valley (C) 56 Moonee Valley
4 Moorabool (S) 78 Moorabool
5 Moreland (C) 88 Moreland
6 Mornington Peninsula (S) 99 Mornington Peninsula
另一种方法是提取您需要的内容:
transform(example, LGA = sub('(.*)\s\(.*', '\1', LGA_formal))
# LGA_formal Median_age LGA
#1 Moira (S) 34 Moira
#2 Monash (C) 34 Monash
#3 Moonee Valley (C) 56 Moonee Valley
#4 Moorabool (S) 78 Moorabool
#5 Moreland (C) 88 Moreland
#6 Mornington Peninsula (S) 99 Mornington Peninsula
或在tidyverse
中:
library(dplyr)
library(stringr)
example %>% mutate(LGA = str_extract(LGA_formal, '.*(?=\s\()'))