基于另一个数据框在数据框中扩展行和添加列

Expand Rows and Add Columns in Data Frame Based On Another Data Frame

概览

team.df 中的每一行都包含与每支 NBA 球队相关联的 NBA team. Each data frame in list.of.all.stars contains multiple rows based on the number of all star players

使用 apply() 系列函数,我如何扩展 team.df 中的行以增加每支球队 和 [=39= 的所有明星球员的数量] 将 list.of.all.stars 中的列合并到最终输出?

我也完全接受非apply()方法,只是想举一个例子,我希望避免编写循环。

下面是我想要的输出:

#   Team_Name Team_Location         Player Captain
# 1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6  Warriors   Oakland, CA Draymond Green   FALSE

可重现的例子

# create data frame 
# about team information
team.df <-
  data.frame(
    Team_Name       = c( "Cavaliers", "Warriors" )
    , Team_Location = c( "Cleveland, OH", "Oakland, CA")
    , stringsAsFactors = FALSE
  )

# create list about
# all stars on each team
list.of.all.stars <-
  list( 
    data.frame(
      Player = c( "LeBron James", "Kevin Love" )
      , Captain = c( TRUE, FALSE )
      , stringsAsFactors = FALSE
    )
    , data.frame( 
      Player = c( "Stephen Curry", "Kevin Durant"
                  , "Klay Thompson", "Draymond Green"
      )
      , Captain = c( TRUE, FALSE, FALSE, FALSE )
      , stringsAsFactors = FALSE
    )
  )

非apply()族方法

# cbind each data frame within the list.of.all.stars
# to its corresponding row in team.df
team.and.all.stars.list.of.df <-
  list(
    cbind(
      df[ 1, ]
      , list.of.all.stars[[1]]
    )
    ,   cbind(
      df[ 2, ]
      , list.of.all.stars[[2]]
    )
  )
# Warning messages:
#   1: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded
# 2: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded

# collapse each list
# into data frame
final.df <-
  data.frame(
    do.call(
      what = "rbind"
      , args = team.and.all.stars.list.of.df
    )
    , stringsAsFactors = FALSE
  )
# view final output
final.df
# Team_Name Team_Location         Player Captain
# 1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6  Warriors   Oakland, CA Draymond Green   FALSE

# end of script #

mapply() 尝试失败

# Hoping to Apply A Function
# using a data frame and
# a list of data frames
mapply.method <-
  mapply(
    FUN = function( x, y )
      cbind.data.frame(
        x
        , y
        , stringsAsFactors = FALSE
      )
    , team.df
    , list.of.all.stars
  )

# view results
mapply.method
#         Team_Name   Team_Location
# x       Character,2 Character,4  
# Player  Character,2 Character,4  
# Captain Logical,2   Logical,4 

# end of script #

考虑到对问题的编辑和所需的输出,我会纯粹使用 data.table

library(data.table)

## combine the list of all stars into one data.table
## creating an 'id' column 
dt_players <- rbindlist(list.of.all.stars, idcol = T)

## we can keep/use the row names as the order of the data 
## is consistent with the list elements 
dt_teams <- as.data.table(team.df, keep.rownames = T)
dt_teams[, rn := as.integer(rn)]

## use a join to combine the data to get the desired result. 
dt_teams[
  dt_players
  , on = c(rn = ".id")
]

#    rn Team_Name Team_Location         Player Captain
# 1:  1 Cavaliers Cleveland, OH   LeBron James    TRUE
# 2:  1 Cavaliers Cleveland, OH     Kevin Love   FALSE
# 3:  2  Warriors   Oakland, CA  Stephen Curry    TRUE
# 4:  2  Warriors   Oakland, CA   Kevin Durant   FALSE
# 5:  2  Warriors   Oakland, CA  Klay Thompson   FALSE
# 6:  2  Warriors   Oakland, CA Draymond Green   FALSE

旧答案

此方法使用 data.table 来完成实际工作,但我为您提供了一个 sapply 方法来获取扩展 team.df 数据框所依据的行数.

它还假定 team.df 中的球队顺序与 list.of.all.starts 中的球员顺序一致(即 data.frame 的行对应于列表元素)

library(data.table)

## grab the rows of each data.frame
reps <- sapply(list.of.all.stars, nrow)

## replace the rows of the data.frame
setDT(team.df)[rep(1:.N, reps), ]

#    Team_Name Team_Location
# 1: Cavaliers Cleveland, OH
# 2: Cavaliers Cleveland, OH
# 3:  Warriors   Oakland, CA
# 4:  Warriors   Oakland, CA
# 5:  Warriors   Oakland, CA
# 6:  Warriors   Oakland, CA

如果您不想使用 data.table,同样的方法可以应用于 data.frame

team.df[rep(row.names(team.df), reps), ]
#     Team_Name Team_Location
# 1   Cavaliers Cleveland, OH
# 1.1 Cavaliers Cleveland, OH
# 2    Warriors   Oakland, CA
# 2.1  Warriors   Oakland, CA
# 2.2  Warriors   Oakland, CA
# 2.3  Warriors   Oakland, CA

或使用类似的概念,但都在 lapply

lst <- lapply(seq_along(list.of.all.stars), function(x) {
  df <- team.df[x, ]
  df[rep(row.names(df), nrow(list.of.all.stars[[x]])), ]
})

do.call(rbind, lst)
#     Team_Name Team_Location
# 1   Cavaliers Cleveland, OH
# 1.1 Cavaliers Cleveland, OH
# 2    Warriors   Oakland, CA
# 2.1  Warriors   Oakland, CA
# 2.2  Warriors   Oakland, CA
# 2.3  Warriors   Oakland, CA

关于 OP 在 Map/mapply 中使用 'team.df' 作为输入的方法 'team.df' 是一个 data.frame,它是一个 list 列。所以,基本输入是vector的一列。它循环遍历 vector 或列而不是整个数据集或行(基于所需的输出)。为了防止这种情况,如果我们用 list 包裹,它是一个单独的单元,它循环到 'list.of.all.stars'

的每个 list 元素
do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))

根据预期的输出,'team.df' 的每一行应具有 'list.of.all.stars' 的相应 list 元素。在这种情况下,按行 split 'team.df' 并执行 cbind

res <- do.call(rbind, Map(cbind,  split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
#   Team_Name Team_Location         Player Captain
#1 Cavaliers Cleveland, OH   LeBron James    TRUE
#2 Cavaliers Cleveland, OH     Kevin Love   FALSE
#3  Warriors   Oakland, CA  Stephen Curry    TRUE
#4  Warriors   Oakland, CA   Kevin Durant   FALSE
#5  Warriors   Oakland, CA  Klay Thompson   FALSE
#6  Warriors   Oakland, CA Draymond Green   FALSE

我们也可以在 tidyverse 中这样做。按 'team.df' 中的所有列分组后,nest 创建一个 'data' 的基本列表(长度为 2),将 'data' 分配给 'list.of.all.stars' 在 mutateunnestlist

library(tidyverse)
team.df %>% 
      group_by_all() %>%
      nest %>% 
      mutate(data = list.of.all.stars) %>% 
      unnest
# A tibble: 6 x 4
#  Team_Name Team_Location Player         Captain
#  <chr>     <chr>         <chr>          <lgl>  
# 1 Cavaliers Cleveland, OH LeBron James   T      
# 2 Cavaliers Cleveland, OH Kevin Love     F      
# 3 Warriors  Oakland, CA   Stephen Curry  T      
# 4 Warriors  Oakland, CA   Kevin Durant   F      
# 5 Warriors  Oakland, CA   Klay Thompson  F      
# 6 Warriors  Oakland, CA   Draymond Green F