R：从列中的字符串中提取数值

Question

我对数据框的 1 个特定列感兴趣，其中每行包含一个社区的名称和分配给该社区的特定编号。

TOR - 胡德 - 班伯里-唐米尔斯 (42) ( 23.6%)

请查看此图片以便更好地理解 neighborhoodnum

我只想提取第一个括号内的数字。 TOR - 胡德 - 奥尔德伍德 (20) ( 25.4%)

我试过使用 stringr 包，但所有函数一次只接受 1 个字符串。此列中有 140 行，我想要所有行中的值。我不确定如何遍历列中的每个字符串

Here is what I have tried and the results

和我使用的一些代码但出现此错误（UseMethod("type") 中的错误：没有适用于 'type' 的方法应用于 class 的对象"c('tbl_df', 'tbl', 'data.frame')")

hood_data<-tibble(hood=demo_edu_dataset$Geography)
head(hood_data)

hoodnum<-hood_data %>%
  #separate(hood, into= c("name", "number"), sep = "")
  stringr::str_extract_all(hood_data, "\d")

谢谢

Answer 1

也许你可以像下面那样尝试gsub，例如

df <- data.frame(X = c("TOR - HOOD - Alderwood (20) ( 25.4%)",
                       "TOR - HOOD - Annex (95) ( 27.9%)"))

df$Y <- as.numeric(gsub(".*?\((\w+)\).*","\1",df$X))

这样

> df
                                     X  Y
1 TOR - HOOD - Alderwood (20) ( 25.4%) 20
2     TOR - HOOD - Annex (95) ( 27.9%) 95

Answer 2

hoodnum<-hood_data %>%
 separate(Geography, into= c("name", "number"), sep = "\(")

这有效

Answer 3

或者使用 stringr 包中的 str_extract 以及正向后视和前视：

str_extract(YOURDATA, "(?<=\()\d{1,}(?=\))")

这个正则表达式表示："when you see ( on the left and )on the right, match the number with at least 1 digit in the middle"。如果将 as.numeric 包裹在整个表达式中，数字将从字符转换为数字：

as.numeric(str_extract(df$X, "(?<=\()\d{1,}(?=\))"))

R：从列中的字符串中提取数值

R: Extracting numerical values from strings in a column

regex

r

data-extraction

stringr