使用 gsub 将字符列值拆分为 4 个新值列并删除原始列的值

Split character column value into 4 new value columns using gsub and drop values of original column

我有一列包含这样的数组值:

 [[["0.10", "35"], ["0.2", "36"]], [["5.1", "2"], ["90.2", "2"]]]

我需要 4 个单独列中的最后两个(在本例中:[["5.1", "2"], ["90.2", "2"]]) 但只有他们的价值观:

5.1 2 90.22(在单独的列中)

我知道我可以像这里描述的那样用 tidyR 实现这个:split character data into numbers and letters

    df %>%
  separate(mycol, 
           into = c("text", "num"), 
           sep = "(?<=[A-Za-z])(?=[0-9])"
           )

但到目前为止,每一次尝试和每一次尝试都失败了。我无法只访问最后 2 个(或 4 个)项目。

如果有任何想法,我将不胜感激。谢谢

我们可以按行分组 (rowwise),然后将带有 fromJSON 的 'mycol' 元素转换为 matrixlistunlistvector,使用 as.data.frame.list 将向量转换为具有 4 列的 data.frame,将其包装在 list 中,然后我们 ungroupunnest list 列与 unnest_wider(来自 tidyr),最后,根据其值与 type.convert

转换列类型
library(dplyr)
library(jsonlite)
library(tidyr)
d %>%
  rowwise %>%
  mutate(newcol = list(setNames(as.data.frame.list(unlist(fromJSON(mycol, 
             simplifyVector  = FALSE)[[2]] )), paste0("X", 1:4)))) %>%
  ungroup %>%
  unnest_wider(c(newcol))   %>%
  type.convert(as.is = TRUE)

-输出

# A tibble: 3 x 5
#  mycol                                                                                 X1    X2    X3    X4
#  <chr>                                                                              <dbl> <int> <dbl> <int>
#1 "[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]"   5.1     2  90.2     2
#2 "[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]"   5.1     2  90.2     2
#3 "[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]"   5.1     2  90.2     2

数据

d <- structure(list(mycol = c("[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]", 
"[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]", 
"[[[\"0.10\", \"35\"], [\"0.2\", \"36\"]], [[\"5.1\", \"2\"], [\"90.2\", \"2\"]]]"
)), class = "data.frame", row.names = c(NA, -3L))

这是一个基于正则表达式和@akrun 数据的 base R 解决方案:

d1 <- sapply(strsplit(d$mycol, ","), function(x) gsub("(?!\.)\D", "", x, perl = T))

我们首先在逗号处拆分 d 并将结果传递给 gsub 函数,该函数删除任何非数字 (\D) 而非 ..我们 t 转换生成的数据帧 d1 以将列转换为行和 select 感兴趣的数据:

d2 <- as.data.frame(t(d1[5:8,]))
d2
   V1 V2   V3 V4
1 5.1  2 90.2  2
2 5.1  2 90.2  2
3 5.1  2 90.2  2

如果您想将结果与原始数据放在一起,则cbind并根据您的需要更改列名:

d3 <- cbind(d, d2)
names(d3) <- c("mycol", "x1", "x2", "x3", "x4")

结果:

d3
                                                             mycol  x1 x2   x3 x4
1 [[["0.10", "35"], ["0.2", "36"]], [["5.1", "2"], ["90.2", "2"]]] 5.1  2 90.2  2
2 [[["0.10", "35"], ["0.2", "36"]], [["5.1", "2"], ["90.2", "2"]]] 5.1  2 90.2  2
3 [[["0.10", "35"], ["0.2", "36"]], [["5.1", "2"], ["90.2", "2"]]] 5.1  2 90.2  2