如何从一列中提取一个词到他们自己单独的列中 r

How to extract a word from a column to their own separate column in r

例如,我有一列中的字符是这样的

"Object=house colour=blue size=big", "Object=roof colour=red size=small", "Object=window colour=green size=medium"

我只想提取颜色后面的词并创建一个新列。所以它看起来像这样

"blue", "red", "green"

我已经开始尝试使用 str_extract 来做到这一点,但我对如何指定事物感到非常迷茫。到目前为止我有

colour<- str_extract(string = df, pattern = "(?<=colour= ).*(?=\,)")

我该如何解决这个问题?

可能的解决方案:

library(tidyverse)

c("Object=house colour=blue size=big", "Object=roof colour=red size=small", "Object=window colour=green size=medium") %>% 
  str_extract("(?<=colour\=)\S+")

#> [1] "blue"  "red"   "green"

=后面没有space,也可以用\S+指定一个或多个non-whitespace

library(stringr)
str_extract(string = df, pattern = "(?<=colour=)\S+")
[1] "blue"  "red"   "green"

数据

df <- c("Object=house colour=blue size=big", "Object=roof colour=red size=small", 
"Object=window colour=green size=medium")

你也可以转换成dcf,把它变成data.frame:

read.dcf(textConnection(paste(chartr("= ", ":\n", text), collapse = "\n\n")), all =TRUE)
  Object colour   size
1  house   blue    big
2   roof    red  small
3 window  green medium

然后你可以select你想要的栏目

基础 R 解决方案:

df <- c(
  "Object=house colour=blue size=big", 
  "Object=roof colour=red size=small", 
  "Object=window colour=green size=medium"
)

res <- data.frame(
  object_info = df,
  object_colour = gsub(
    ".*\s+colour\=(\S+).*",
    "\1",
    df
  ),
  row.names = NULL,
  stringsAsFactors = FALSE
)

res