创建 tibble 或数据框的 tibbles 或数据框和其他 class

Question

是否可以创建一个 tibble 或 data.frame，它的列是整数，而其他列是 tibbles 或 data.frames？

例如：

library(tibble)
set.seed(1)
df.1 <- tibble(name=sample(LETTERS,20,replace = F),score=sample(1:100,20,replace = F))
df.2 <- tibble(name=sample(LETTERS,20,replace = F),score=sample(1:100,20,replace = F))

然后：

df <- tibble(id=1,rank=2,data=df.1)

这给出了这个 error:

Error: Column `data` must be a 1d atomic vector or a list

我想 df.1 必须是 list 才能起作用？

Answer 1

这是您要找的吗？我认为关键是每列的长度应该相同，我们需要使用 list 创建一个列表列来存储 df.1 和 df.2.

df <- tibble(id = 1:2,
             rank = 2,
             data = list(df.1, df.2))
df
# # A tibble: 2 x 3
#      id  rank              data
#   <int> <dbl>            <list>
# 1     1     2 <tibble [20 x 2]>
# 2     2     2 <tibble [20 x 2]>

head(df$data[[1]])
# # A tibble: 6 x 2
#    name score
#   <chr> <int>
# 1     G    94
# 2     J    22
# 3     N    64
# 4     U    13
# 5     E    26
# 6     S    37

head(df$data[[2]])
# # A tibble: 6 x 2
#    name score
#   <chr> <int>
# 1     V    92
# 2     Q    30
# 3     S    45
# 4     M    33
# 5     L    63
# 6     Y    25

并且由于data列中每个tibble的结构都是相同的。我们可以使用 tidyr::unnest 来扩展小标题。

library(tidyr)
df_un <- unnest(df)
# # A tibble: 40 x 4
#       id  rank  name score
#    <int> <dbl> <chr> <int>
#  1     1     2     G    94
#  2     1     2     J    22
#  3     1     2     N    64
#  4     1     2     U    13
#  5     1     2     E    26
#  6     1     2     S    37
#  7     1     2     W     2
#  8     1     2     M    36
#  9     1     2     L    81
# 10     1     2     B    31
# # ... with 30 more rows

而且我们还可以nest tibble，将其恢复为带有列表列的原始格式。

library(dplyr)
df_n <- df_un %>%
  group_by(id, rank) %>%
  nest() %>%
  ungroup()
df_n
# # A tibble: 2 x 3
#        id  rank              data
#     <int> <dbl>            <list>
#   1     1     2 <tibble [20 x 2]>
#   2     2     2 <tibble [20 x 2]>

# Check if df and df_n are the same
identical(df_n, df)
# [1] TRUE

Answer 2

使用 tidyr 的 nest:

set.seed(1)
df.1 <- data.frame(name=sample(LETTERS,20,replace = F),score=sample(1:100,20,replace = F))
df.2 <- data.frame(name=sample(LETTERS,20,replace = F),score=sample(1:100,20,replace = F))

我可以创建一个 tibble，其中 df.1 嵌套在 id 和 rank 下：

library(dplyr)
library(tidyr)

data.frame(id=1,rank=2,data=df.1) %>% nest(-id,-rank)

# A tibble: 1 × 3
     id  rank              data
  <dbl> <dbl>            <list>
1     1     2 <tibble [20 × 2]>

为了在 tibble 中同时拥有 df.1 和 df.2，我只想做：

data.frame(id=c(1,2),rank=c(2,1),data=c(df.1,df.2)) %>% nest(-id,-rank)


# A tibble: 2 × 3
     id  rank              data
  <dbl> <dbl>            <list>
1     1     2 <tibble [10 × 4]>
2     2     1 <tibble [10 × 4]>

创建 tibble 或数据框的 tibbles 或数据框和其他 class

Creating tibble or data frame of tibbles or data frames and other class

r

dataframe

tibble