"dictionary" 避免因素列表的 R 数据框

Question

我有一个包含两列的数据框 df，一列包含名称，第二列包含可以是字符串或双精度值的值，例如

> df
       name   value
1  cat_name    Bart
2   cat_age       5
3  dog_name    Fred
4   dog_age       9
5 total_pet       2

I'd like to convert df into a list of named objects so I can call list$cat_name and get back a string "Bart" or list$bird_age and get back 1 as a numeric.

我试过了

> list <- split(df[, 2], df[, 1])
> list
$cat_age
[1] 5
Levels: 2 5 9 Bart Fred

$cat_name
[1] Bart
Levels: 2 5 9 Bart Fred

$dog_age
[1] 9
Levels: 2 5 9 Bart Fred

$dog_name
[1] Fred
Levels: 2 5 9 Bart Fred

$total_pet
[1] 2
Levels: 2 5 9 Bart Fred

将 df 转换为 list 的因素。这几乎是我想要的，因为 $ 运算符工作正常。但是，我并没有真正习惯于处理因素，我想知道是否还有另一个可用的 dataframe-to-list 转换。烦人的部分来自于这样一个事实，即为了处理字符串和数字，我们必须将因素转换回这些类型

> as.character(list$cat_name)
[1] "Bart"
> as.numeric(as.character(list$total_pet))
[1] 3

在注意到 df[, 1] 和 df[, 2] 实际上是 因素后 我尝试使用

> list <- split(as.character(df[, 2]), df[, 1])
> list
$cat_age
[1] "5"

$cat_name
[1] "Bart"

$dog_age
[1] "9"

$dog_name
[1] "Fred"

$total_pet
[1] "2"

这几乎解决了问题，只是数字是稍后要转换的字符。我也尝试过使用 hash objects

> h <- hash(as.vector(df[, 1]), as.vector(df[, 2]))
> l = as.list(h)
> l
$dog_age
[1] "9"

$dog_name
[1] "Fred"

$cat_age
[1] "5"

$total_pet
[1] "2"

$cat_name
[1] "Bart"

但我得到了相同的结果。

有人有什么建议吗？我是否遗漏了一些明显的东西？

坦克:)

Answer 1

基于 R 的方法...

df[,]<- lapply(df, as.character) # changing factors to character
list <- split(df[, 2], df[, 1])  # Split df just as you did.

list2 <- lapply(list, function(x) {
  y <- regmatches(x, regexpr("\d", x));
  z <-ifelse(length(y)!=0, as.numeric(y), x);
  z
})

$cat_age
[1] 5

$cat_name
[1] "Bart"

$dog_age
[1] 9

$dog_name
[1] "Fred"

$total_pet
[1] 2

正在检查class：

> sapply(list2, class)
    cat_age    cat_name     dog_age    dog_name   total_pet 
  "numeric" "character"   "numeric" "character"   "numeric"

您的数据是：

df <- read.table(text="      name   value
1  cat_name    Bart
                 2   cat_age       5
                 3  dog_name    Fred
                 4   dog_age       9
                 5 total_pet       2", header=TRUE)

Answer 2

我们可以用 type.convert

library(purrr)
map(list, type.convert, as.is = TRUE)
#$cat_age
#[1] 5

#$cat_name
#[1] "Bart"

#$dog_age
#[1] 9

#$dog_name
#[1] "Fred"

#$total_pet
#[1] 2

由于并行实施可能会更有效，因此 future_map 来自 furrr

的一种选择

library(furrr)
plan(multiprocess)
future_map(list, type.convert, as.is = TRUE)

"dictionary" 避免因素列表的 R 数据框

R dataframe to "dictionary" avoiding list of factors

hash

dictionary

r

list

r-factor