从数据框中获取嵌套值
obtain nested values from dateframe
我正在尝试获取 event
列中的最大值,直到达到 agreement
(虚拟);事件嵌套在协议中,协议嵌套在 dyad
中 运行 超过 year
。请注意,年份并不总是连续的,这意味着年份之间存在中断(1986、1987、2001、2002)。
我能够使用 ddply
和 max(event) 获得二元组中的最大值;但我很难将不同的事件“分配”给正确的协议 (until/after)。我基本上缺少一个 'identifier' 将每个观察结果分配给一个协议。
我要找的结果已经在"result"栏中了。
dyad year event agreement agreement.name result
1 1985 9
1 1986 4 1 agreement1 9
1 1987
1 2001 3
1 2002 1 agreement2 3
2 1999 1
2 2000 5
2 2001 1 agreement3 5
2 2002 2
2 2003
2 2004 1 agreement 4 2
以下是希望更易于使用的格式的数据:
df<-structure(list(dyad = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), year = c(1985L, 1986L, 1987L, 2001L, 2002L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L), event = c(9L, 4L, NA, 3L, NA, 1L,
5L, NA, 2L, NA, NA), agreement = c(NA, 1L, NA, NA, 1L, NA, NA,
1L, NA, NA, 1L), agreement.name = c("", "agreement1", "", "",
"agreement2", "", "", "agreement3", "", "", "agreement 4"), result = c(NA,
9L, NA, NA, 3L, NA, NA, 5L, NA, NA, 2L)), .Names = c("dyad",
"year", "event", "agreement", "agreement.name", "result"), class = "data.frame", row.names = c(NA,
-11L))
这是一个使用 data.table
的选项。将'data.frame'转换为'data.table'(setDT(df)
),根据'agreement.name'中的非空元素创建另一个分组变量('ind')。按 'dyad' 和 'ind' 列分组,我们创建一个新列 'result' 使用 ifelse
来填充具有 'agreement.name' 的行是非空的 max
共 'event'
library(data.table)
setDT(df)[, ind:=cumsum(c(TRUE,diff(agreement.name=='')>0)),dyad][,
result:=ifelse(agreement.name!='', max(event, na.rm=TRUE), NA) ,
list(dyad, ind)][, ind:=NULL][]
# dyad year event agreement agreement.name result
# 1: 1 1985 9 NA NA
# 2: 1 1986 4 1 agreement1 9
# 3: 1 1987 NA NA NA
# 4: 1 2001 3 NA NA
# 5: 1 2002 NA 1 agreement2 3
# 6: 2 1999 1 NA NA
# 7: 2 2000 5 NA NA
# 8: 2 2001 NA 1 agreement3 5
# 9: 2 2002 2 NA NA
#10: 2 2003 NA NA NA
#11: 2 2004 NA 1 agreement 4 2
或者我们可以使用数字索引
而不是 ifelse
setDT(df)[, result:=c(NA, max(event, na.rm=TRUE))[(agreement.name!='')+1L] ,
list(ind= cumsum(c(TRUE,diff(agreement.name=='')>0)),dyad)][]
数据
df <- structure(list(dyad = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), year = c(1985L, 1986L, 1987L, 2001L, 2002L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L), event = c(9L, 4L, NA, 3L, NA, 1L,
5L, NA, 2L, NA, NA), agreement = c(NA, 1L, NA, NA, 1L, NA, NA,
1L, NA, NA, 1L), agreement.name = c("", "agreement1", "", "",
"agreement2", "", "", "agreement3", "", "", "agreement 4")),
.Names = c("dyad",
"year", "event", "agreement", "agreement.name"), row.names = c(NA,
-11L), class = "data.frame")
我正在尝试获取 event
列中的最大值,直到达到 agreement
(虚拟);事件嵌套在协议中,协议嵌套在 dyad
中 运行 超过 year
。请注意,年份并不总是连续的,这意味着年份之间存在中断(1986、1987、2001、2002)。
我能够使用 ddply
和 max(event) 获得二元组中的最大值;但我很难将不同的事件“分配”给正确的协议 (until/after)。我基本上缺少一个 'identifier' 将每个观察结果分配给一个协议。
我要找的结果已经在"result"栏中了。
dyad year event agreement agreement.name result
1 1985 9
1 1986 4 1 agreement1 9
1 1987
1 2001 3
1 2002 1 agreement2 3
2 1999 1
2 2000 5
2 2001 1 agreement3 5
2 2002 2
2 2003
2 2004 1 agreement 4 2
以下是希望更易于使用的格式的数据:
df<-structure(list(dyad = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), year = c(1985L, 1986L, 1987L, 2001L, 2002L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L), event = c(9L, 4L, NA, 3L, NA, 1L,
5L, NA, 2L, NA, NA), agreement = c(NA, 1L, NA, NA, 1L, NA, NA,
1L, NA, NA, 1L), agreement.name = c("", "agreement1", "", "",
"agreement2", "", "", "agreement3", "", "", "agreement 4"), result = c(NA,
9L, NA, NA, 3L, NA, NA, 5L, NA, NA, 2L)), .Names = c("dyad",
"year", "event", "agreement", "agreement.name", "result"), class = "data.frame", row.names = c(NA,
-11L))
这是一个使用 data.table
的选项。将'data.frame'转换为'data.table'(setDT(df)
),根据'agreement.name'中的非空元素创建另一个分组变量('ind')。按 'dyad' 和 'ind' 列分组,我们创建一个新列 'result' 使用 ifelse
来填充具有 'agreement.name' 的行是非空的 max
共 'event'
library(data.table)
setDT(df)[, ind:=cumsum(c(TRUE,diff(agreement.name=='')>0)),dyad][,
result:=ifelse(agreement.name!='', max(event, na.rm=TRUE), NA) ,
list(dyad, ind)][, ind:=NULL][]
# dyad year event agreement agreement.name result
# 1: 1 1985 9 NA NA
# 2: 1 1986 4 1 agreement1 9
# 3: 1 1987 NA NA NA
# 4: 1 2001 3 NA NA
# 5: 1 2002 NA 1 agreement2 3
# 6: 2 1999 1 NA NA
# 7: 2 2000 5 NA NA
# 8: 2 2001 NA 1 agreement3 5
# 9: 2 2002 2 NA NA
#10: 2 2003 NA NA NA
#11: 2 2004 NA 1 agreement 4 2
或者我们可以使用数字索引
而不是ifelse
setDT(df)[, result:=c(NA, max(event, na.rm=TRUE))[(agreement.name!='')+1L] ,
list(ind= cumsum(c(TRUE,diff(agreement.name=='')>0)),dyad)][]
数据
df <- structure(list(dyad = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
2L), year = c(1985L, 1986L, 1987L, 2001L, 2002L, 1999L, 2000L,
2001L, 2002L, 2003L, 2004L), event = c(9L, 4L, NA, 3L, NA, 1L,
5L, NA, 2L, NA, NA), agreement = c(NA, 1L, NA, NA, 1L, NA, NA,
1L, NA, NA, 1L), agreement.name = c("", "agreement1", "", "",
"agreement2", "", "", "agreement3", "", "", "agreement 4")),
.Names = c("dyad",
"year", "event", "agreement", "agreement.name"), row.names = c(NA,
-11L), class = "data.frame")