基于另一个数据框在数据框中扩展行和添加列
Expand Rows and Add Columns in Data Frame Based On Another Data Frame
概览
team.df
中的每一行都包含与每支 NBA 球队相关联的 NBA team. Each data frame in list.of.all.stars
contains multiple rows based on the number of all star players。
使用 apply()
系列函数,我如何扩展 team.df
中的行以增加每支球队 和 [=39= 的所有明星球员的数量] 将 list.of.all.stars
中的列合并到最终输出?
我也完全接受非apply()
方法,只是想举一个例子,我希望避免编写循环。
下面是我想要的输出:
# Team_Name Team_Location Player Captain
# 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2 Cavaliers Cleveland, OH Kevin Love FALSE
# 3 Warriors Oakland, CA Stephen Curry TRUE
# 4 Warriors Oakland, CA Kevin Durant FALSE
# 5 Warriors Oakland, CA Klay Thompson FALSE
# 6 Warriors Oakland, CA Draymond Green FALSE
可重现的例子
# create data frame
# about team information
team.df <-
data.frame(
Team_Name = c( "Cavaliers", "Warriors" )
, Team_Location = c( "Cleveland, OH", "Oakland, CA")
, stringsAsFactors = FALSE
)
# create list about
# all stars on each team
list.of.all.stars <-
list(
data.frame(
Player = c( "LeBron James", "Kevin Love" )
, Captain = c( TRUE, FALSE )
, stringsAsFactors = FALSE
)
, data.frame(
Player = c( "Stephen Curry", "Kevin Durant"
, "Klay Thompson", "Draymond Green"
)
, Captain = c( TRUE, FALSE, FALSE, FALSE )
, stringsAsFactors = FALSE
)
)
非apply()族方法
# cbind each data frame within the list.of.all.stars
# to its corresponding row in team.df
team.and.all.stars.list.of.df <-
list(
cbind(
df[ 1, ]
, list.of.all.stars[[1]]
)
, cbind(
df[ 2, ]
, list.of.all.stars[[2]]
)
)
# Warning messages:
# 1: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# 2: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# collapse each list
# into data frame
final.df <-
data.frame(
do.call(
what = "rbind"
, args = team.and.all.stars.list.of.df
)
, stringsAsFactors = FALSE
)
# view final output
final.df
# Team_Name Team_Location Player Captain
# 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2 Cavaliers Cleveland, OH Kevin Love FALSE
# 3 Warriors Oakland, CA Stephen Curry TRUE
# 4 Warriors Oakland, CA Kevin Durant FALSE
# 5 Warriors Oakland, CA Klay Thompson FALSE
# 6 Warriors Oakland, CA Draymond Green FALSE
# end of script #
mapply() 尝试失败
# Hoping to Apply A Function
# using a data frame and
# a list of data frames
mapply.method <-
mapply(
FUN = function( x, y )
cbind.data.frame(
x
, y
, stringsAsFactors = FALSE
)
, team.df
, list.of.all.stars
)
# view results
mapply.method
# Team_Name Team_Location
# x Character,2 Character,4
# Player Character,2 Character,4
# Captain Logical,2 Logical,4
# end of script #
考虑到对问题的编辑和所需的输出,我会纯粹使用 data.table
library(data.table)
## combine the list of all stars into one data.table
## creating an 'id' column
dt_players <- rbindlist(list.of.all.stars, idcol = T)
## we can keep/use the row names as the order of the data
## is consistent with the list elements
dt_teams <- as.data.table(team.df, keep.rownames = T)
dt_teams[, rn := as.integer(rn)]
## use a join to combine the data to get the desired result.
dt_teams[
dt_players
, on = c(rn = ".id")
]
# rn Team_Name Team_Location Player Captain
# 1: 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2: 1 Cavaliers Cleveland, OH Kevin Love FALSE
# 3: 2 Warriors Oakland, CA Stephen Curry TRUE
# 4: 2 Warriors Oakland, CA Kevin Durant FALSE
# 5: 2 Warriors Oakland, CA Klay Thompson FALSE
# 6: 2 Warriors Oakland, CA Draymond Green FALSE
旧答案
此方法使用 data.table
来完成实际工作,但我为您提供了一个 sapply
方法来获取扩展 team.df
数据框所依据的行数.
它还假定 team.df
中的球队顺序与 list.of.all.starts
中的球员顺序一致(即 data.frame
的行对应于列表元素)
library(data.table)
## grab the rows of each data.frame
reps <- sapply(list.of.all.stars, nrow)
## replace the rows of the data.frame
setDT(team.df)[rep(1:.N, reps), ]
# Team_Name Team_Location
# 1: Cavaliers Cleveland, OH
# 2: Cavaliers Cleveland, OH
# 3: Warriors Oakland, CA
# 4: Warriors Oakland, CA
# 5: Warriors Oakland, CA
# 6: Warriors Oakland, CA
如果您不想使用 data.table
,同样的方法可以应用于 data.frame
team.df[rep(row.names(team.df), reps), ]
# Team_Name Team_Location
# 1 Cavaliers Cleveland, OH
# 1.1 Cavaliers Cleveland, OH
# 2 Warriors Oakland, CA
# 2.1 Warriors Oakland, CA
# 2.2 Warriors Oakland, CA
# 2.3 Warriors Oakland, CA
或使用类似的概念,但都在 lapply
中
lst <- lapply(seq_along(list.of.all.stars), function(x) {
df <- team.df[x, ]
df[rep(row.names(df), nrow(list.of.all.stars[[x]])), ]
})
do.call(rbind, lst)
# Team_Name Team_Location
# 1 Cavaliers Cleveland, OH
# 1.1 Cavaliers Cleveland, OH
# 2 Warriors Oakland, CA
# 2.1 Warriors Oakland, CA
# 2.2 Warriors Oakland, CA
# 2.3 Warriors Oakland, CA
关于 OP 在 Map/mapply
中使用 'team.df' 作为输入的方法 'team.df' 是一个 data.frame
,它是一个 list
列。所以,基本输入是vector
的一列。它循环遍历 vector
或列而不是整个数据集或行(基于所需的输出)。为了防止这种情况,如果我们用 list
包裹,它是一个单独的单元,它循环到 'list.of.all.stars'
的每个 list
元素
do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))
根据预期的输出,'team.df' 的每一行应具有 'list.of.all.stars' 的相应 list
元素。在这种情况下,按行 split
'team.df' 并执行 cbind
res <- do.call(rbind, Map(cbind, split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
# Team_Name Team_Location Player Captain
#1 Cavaliers Cleveland, OH LeBron James TRUE
#2 Cavaliers Cleveland, OH Kevin Love FALSE
#3 Warriors Oakland, CA Stephen Curry TRUE
#4 Warriors Oakland, CA Kevin Durant FALSE
#5 Warriors Oakland, CA Klay Thompson FALSE
#6 Warriors Oakland, CA Draymond Green FALSE
我们也可以在 tidyverse
中这样做。按 'team.df' 中的所有列分组后,nest
创建一个 'data' 的基本列表(长度为 2),将 'data' 分配给 'list.of.all.stars' 在 mutate
和 unnest
中 list
library(tidyverse)
team.df %>%
group_by_all() %>%
nest %>%
mutate(data = list.of.all.stars) %>%
unnest
# A tibble: 6 x 4
# Team_Name Team_Location Player Captain
# <chr> <chr> <chr> <lgl>
# 1 Cavaliers Cleveland, OH LeBron James T
# 2 Cavaliers Cleveland, OH Kevin Love F
# 3 Warriors Oakland, CA Stephen Curry T
# 4 Warriors Oakland, CA Kevin Durant F
# 5 Warriors Oakland, CA Klay Thompson F
# 6 Warriors Oakland, CA Draymond Green F
概览
team.df
中的每一行都包含与每支 NBA 球队相关联的 NBA team. Each data frame in list.of.all.stars
contains multiple rows based on the number of all star players。
使用 apply()
系列函数,我如何扩展 team.df
中的行以增加每支球队 和 [=39= 的所有明星球员的数量] 将 list.of.all.stars
中的列合并到最终输出?
我也完全接受非apply()
方法,只是想举一个例子,我希望避免编写循环。
下面是我想要的输出:
# Team_Name Team_Location Player Captain
# 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2 Cavaliers Cleveland, OH Kevin Love FALSE
# 3 Warriors Oakland, CA Stephen Curry TRUE
# 4 Warriors Oakland, CA Kevin Durant FALSE
# 5 Warriors Oakland, CA Klay Thompson FALSE
# 6 Warriors Oakland, CA Draymond Green FALSE
可重现的例子
# create data frame
# about team information
team.df <-
data.frame(
Team_Name = c( "Cavaliers", "Warriors" )
, Team_Location = c( "Cleveland, OH", "Oakland, CA")
, stringsAsFactors = FALSE
)
# create list about
# all stars on each team
list.of.all.stars <-
list(
data.frame(
Player = c( "LeBron James", "Kevin Love" )
, Captain = c( TRUE, FALSE )
, stringsAsFactors = FALSE
)
, data.frame(
Player = c( "Stephen Curry", "Kevin Durant"
, "Klay Thompson", "Draymond Green"
)
, Captain = c( TRUE, FALSE, FALSE, FALSE )
, stringsAsFactors = FALSE
)
)
非apply()族方法
# cbind each data frame within the list.of.all.stars
# to its corresponding row in team.df
team.and.all.stars.list.of.df <-
list(
cbind(
df[ 1, ]
, list.of.all.stars[[1]]
)
, cbind(
df[ 2, ]
, list.of.all.stars[[2]]
)
)
# Warning messages:
# 1: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# 2: In data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# collapse each list
# into data frame
final.df <-
data.frame(
do.call(
what = "rbind"
, args = team.and.all.stars.list.of.df
)
, stringsAsFactors = FALSE
)
# view final output
final.df
# Team_Name Team_Location Player Captain
# 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2 Cavaliers Cleveland, OH Kevin Love FALSE
# 3 Warriors Oakland, CA Stephen Curry TRUE
# 4 Warriors Oakland, CA Kevin Durant FALSE
# 5 Warriors Oakland, CA Klay Thompson FALSE
# 6 Warriors Oakland, CA Draymond Green FALSE
# end of script #
mapply() 尝试失败
# Hoping to Apply A Function
# using a data frame and
# a list of data frames
mapply.method <-
mapply(
FUN = function( x, y )
cbind.data.frame(
x
, y
, stringsAsFactors = FALSE
)
, team.df
, list.of.all.stars
)
# view results
mapply.method
# Team_Name Team_Location
# x Character,2 Character,4
# Player Character,2 Character,4
# Captain Logical,2 Logical,4
# end of script #
考虑到对问题的编辑和所需的输出,我会纯粹使用 data.table
library(data.table)
## combine the list of all stars into one data.table
## creating an 'id' column
dt_players <- rbindlist(list.of.all.stars, idcol = T)
## we can keep/use the row names as the order of the data
## is consistent with the list elements
dt_teams <- as.data.table(team.df, keep.rownames = T)
dt_teams[, rn := as.integer(rn)]
## use a join to combine the data to get the desired result.
dt_teams[
dt_players
, on = c(rn = ".id")
]
# rn Team_Name Team_Location Player Captain
# 1: 1 Cavaliers Cleveland, OH LeBron James TRUE
# 2: 1 Cavaliers Cleveland, OH Kevin Love FALSE
# 3: 2 Warriors Oakland, CA Stephen Curry TRUE
# 4: 2 Warriors Oakland, CA Kevin Durant FALSE
# 5: 2 Warriors Oakland, CA Klay Thompson FALSE
# 6: 2 Warriors Oakland, CA Draymond Green FALSE
旧答案
此方法使用 data.table
来完成实际工作,但我为您提供了一个 sapply
方法来获取扩展 team.df
数据框所依据的行数.
它还假定 team.df
中的球队顺序与 list.of.all.starts
中的球员顺序一致(即 data.frame
的行对应于列表元素)
library(data.table)
## grab the rows of each data.frame
reps <- sapply(list.of.all.stars, nrow)
## replace the rows of the data.frame
setDT(team.df)[rep(1:.N, reps), ]
# Team_Name Team_Location
# 1: Cavaliers Cleveland, OH
# 2: Cavaliers Cleveland, OH
# 3: Warriors Oakland, CA
# 4: Warriors Oakland, CA
# 5: Warriors Oakland, CA
# 6: Warriors Oakland, CA
如果您不想使用 data.table
,同样的方法可以应用于 data.frame
team.df[rep(row.names(team.df), reps), ]
# Team_Name Team_Location
# 1 Cavaliers Cleveland, OH
# 1.1 Cavaliers Cleveland, OH
# 2 Warriors Oakland, CA
# 2.1 Warriors Oakland, CA
# 2.2 Warriors Oakland, CA
# 2.3 Warriors Oakland, CA
或使用类似的概念,但都在 lapply
lst <- lapply(seq_along(list.of.all.stars), function(x) {
df <- team.df[x, ]
df[rep(row.names(df), nrow(list.of.all.stars[[x]])), ]
})
do.call(rbind, lst)
# Team_Name Team_Location
# 1 Cavaliers Cleveland, OH
# 1.1 Cavaliers Cleveland, OH
# 2 Warriors Oakland, CA
# 2.1 Warriors Oakland, CA
# 2.2 Warriors Oakland, CA
# 2.3 Warriors Oakland, CA
关于 OP 在 Map/mapply
中使用 'team.df' 作为输入的方法 'team.df' 是一个 data.frame
,它是一个 list
列。所以,基本输入是vector
的一列。它循环遍历 vector
或列而不是整个数据集或行(基于所需的输出)。为了防止这种情况,如果我们用 list
包裹,它是一个单独的单元,它循环到 'list.of.all.stars'
list
元素
do.call(rbind, Map(cbind, list(team.df), list.of.all.stars))
根据预期的输出,'team.df' 的每一行应具有 'list.of.all.stars' 的相应 list
元素。在这种情况下,按行 split
'team.df' 并执行 cbind
res <- do.call(rbind, Map(cbind, split(team.df, seq_len(nrow(team.df))), list.of.all.stars))
row.names(res) <- NULL
res
# Team_Name Team_Location Player Captain
#1 Cavaliers Cleveland, OH LeBron James TRUE
#2 Cavaliers Cleveland, OH Kevin Love FALSE
#3 Warriors Oakland, CA Stephen Curry TRUE
#4 Warriors Oakland, CA Kevin Durant FALSE
#5 Warriors Oakland, CA Klay Thompson FALSE
#6 Warriors Oakland, CA Draymond Green FALSE
我们也可以在 tidyverse
中这样做。按 'team.df' 中的所有列分组后,nest
创建一个 'data' 的基本列表(长度为 2),将 'data' 分配给 'list.of.all.stars' 在 mutate
和 unnest
中 list
library(tidyverse)
team.df %>%
group_by_all() %>%
nest %>%
mutate(data = list.of.all.stars) %>%
unnest
# A tibble: 6 x 4
# Team_Name Team_Location Player Captain
# <chr> <chr> <chr> <lgl>
# 1 Cavaliers Cleveland, OH LeBron James T
# 2 Cavaliers Cleveland, OH Kevin Love F
# 3 Warriors Oakland, CA Stephen Curry T
# 4 Warriors Oakland, CA Kevin Durant F
# 5 Warriors Oakland, CA Klay Thompson F
# 6 Warriors Oakland, CA Draymond Green F