根据组和时间段从数据框中创建边缘列表
Create edgelist from dataframe depending on groups and time period
我正在尝试从取决于两件事的数据框创建边缘列表:(1) 属于同一组 (2) 在同一时间段内。一个人可能同时属于多个组。
# read example vectors
ppl <- c("pers1", "pers2","pers3","pers4","pers5","pers2","pers6","pers1")
grp <- c(1,1,1,2,2,2,3,3)
timeST <- c(2005,2005,2010,2012,2014,2007,2008,2008)
timeTER <- c(2010,2007,2018,2014,2015,2010,2020,2020)
# construct example data frame
example.df <- data.frame(ppl, grp, timeST, timeTER)
这是我目前得到的:
#create edge list by groups:
target.df <- example.df %>% select(ppl, grp, timeST, timeTER) %>%
inner_join(., select(., grp, ppl), by = "grp") %>%
rename(ppl1 = ppl.x, ppl2 = ppl.y) %>%
filter(ppl1 != ppl2) %>%
unique %>%
arrange(grp)
target.df <- target.df[, c("ppl1","ppl2","grp","timeST","timeTER")]
# display results:
target.df
但是,我不知道如何将其划分为多个年份,如:当 persX 和 persY 同时在同一组时,两个人之间应该只有边缘。我假设数据需要放入长格式,所以我尝试使用 reshape 和 reshape2,但无法让它反映时间段。
理想情况下,我会每年创建一个边缘列表,然后将其转换为邻接矩阵(这本身没有问题)。这个问题由于每个人都需要出现在每个邻接矩阵中而变得更加复杂,因此如果 pers4 在 2011 年之后不存在,它仍然需要出现在矩阵中,但每个人的值 10 row/column而不是 0 或 1.. 但我想一步一个脚印。
这看起来像这样:
ex.M <- matrix(c(0,1,0,10,1,0,0,10,0,0,0,10,10,10,10,10), nrow=4, ncol=4)
ex.M
如有任何帮助,我们将不胜感激。
提前致谢。
有趣的问题,尽管解决方案更多的是关于管理数据而不是 igraph 本身。在每个组中,您想要列出时间重叠的人。
我们需要一种评估重叠的好方法。
这不是最好的解决方案(对 overlap-function 的 result-format 进行黑客攻击使得无法使用 0 年),但至少它具有教学意义。
library('dplyr')
ppl <- c("pers1", "pers2","pers3","pers4","pers5","pers2","pers6","pers1")
grp <- c(1,1,1,2,2,2,3,3)
timeST <- c(2005,2005,2010,2012,2014,2007,2008,2008)
timeTER <- c(2010,2007,2018,2014,2015,2010,2020,2020)
# construct example data frame
example.df <- data.frame(ppl, grp, timeST, timeTER)
# Just like in your logic, these are peple that overlap within the same group.
df <- example.df %>% inner_join(example.df, by="grp")
# We get a structure with the start and end times for both i and j like:
names(df)
# This function is used to compute overlapping years between the intervals
# that exist between timeST.x-timeTER.x and "timeST.y-timeTER.y
time.period.overlap <- function(x_start, x_end, y_start, y_end)
{
# Return intersections of time-periods (x_start - x_end) and (y_start - y_end)
x <- seq(x_start, x_end)
y <- seq(y_start, y_end)
# Each result contains at least a row of 0 to stay true to the data-format and avoid NULLs
c(unique( c(x[x %in% y], y[y %in% x]) ), 0)
}
# Make an edge-list of person-to-person WITHIN grp and WITH overlapping years
# as defined by time.period.overlap(). Choose only rows that DO HAVE an overlap
all.edges <-
do.call('rbind',
lapply(1:nrow(df), function(x)
data.frame(
i = df[x, 'ppl.x'],
j = df[x, 'ppl.y'],
grp = df[x, 'grp'],
yr_overlap = time.period.overlap(df[x, 'timeST.x'], df[x, 'timeTER.x'], df[x, 'timeST.y'], df[x, 'timeTER.y'])
)
)
) %>% filter(yr_overlap != 0)
# Note that pairs like edges like pers4->pers2 in group 2 are not in this df
# since they never appeared in that group during the same year!
all.edges[all.edges$i == 'pers4',]
# For each pair i->j within each group, one row exists for each overlapping year
# Group by i, j and group to find the number of years of each pair's overlap to use in the network
el <- all.edges %>% group_by(i, j, grp) %>% summarise(n_yr_overlap = n(), first_overlap = min(yr_overlap))
现在可以将 edge-list el 发送到聚合网络的 igraph,或者您可以使用 all.edges
.
对每个给定年份的网络进行子采样
记得根据人们是否有自己的优势(他们在这个输出)。
我正在尝试从取决于两件事的数据框创建边缘列表:(1) 属于同一组 (2) 在同一时间段内。一个人可能同时属于多个组。
# read example vectors
ppl <- c("pers1", "pers2","pers3","pers4","pers5","pers2","pers6","pers1")
grp <- c(1,1,1,2,2,2,3,3)
timeST <- c(2005,2005,2010,2012,2014,2007,2008,2008)
timeTER <- c(2010,2007,2018,2014,2015,2010,2020,2020)
# construct example data frame
example.df <- data.frame(ppl, grp, timeST, timeTER)
这是我目前得到的:
#create edge list by groups:
target.df <- example.df %>% select(ppl, grp, timeST, timeTER) %>%
inner_join(., select(., grp, ppl), by = "grp") %>%
rename(ppl1 = ppl.x, ppl2 = ppl.y) %>%
filter(ppl1 != ppl2) %>%
unique %>%
arrange(grp)
target.df <- target.df[, c("ppl1","ppl2","grp","timeST","timeTER")]
# display results:
target.df
但是,我不知道如何将其划分为多个年份,如:当 persX 和 persY 同时在同一组时,两个人之间应该只有边缘。我假设数据需要放入长格式,所以我尝试使用 reshape 和 reshape2,但无法让它反映时间段。
理想情况下,我会每年创建一个边缘列表,然后将其转换为邻接矩阵(这本身没有问题)。这个问题由于每个人都需要出现在每个邻接矩阵中而变得更加复杂,因此如果 pers4 在 2011 年之后不存在,它仍然需要出现在矩阵中,但每个人的值 10 row/column而不是 0 或 1.. 但我想一步一个脚印。
这看起来像这样:
ex.M <- matrix(c(0,1,0,10,1,0,0,10,0,0,0,10,10,10,10,10), nrow=4, ncol=4)
ex.M
如有任何帮助,我们将不胜感激。
提前致谢。
有趣的问题,尽管解决方案更多的是关于管理数据而不是 igraph 本身。在每个组中,您想要列出时间重叠的人。
我们需要一种评估重叠的好方法。
这不是最好的解决方案(对 overlap-function 的 result-format 进行黑客攻击使得无法使用 0 年),但至少它具有教学意义。
library('dplyr')
ppl <- c("pers1", "pers2","pers3","pers4","pers5","pers2","pers6","pers1")
grp <- c(1,1,1,2,2,2,3,3)
timeST <- c(2005,2005,2010,2012,2014,2007,2008,2008)
timeTER <- c(2010,2007,2018,2014,2015,2010,2020,2020)
# construct example data frame
example.df <- data.frame(ppl, grp, timeST, timeTER)
# Just like in your logic, these are peple that overlap within the same group.
df <- example.df %>% inner_join(example.df, by="grp")
# We get a structure with the start and end times for both i and j like:
names(df)
# This function is used to compute overlapping years between the intervals
# that exist between timeST.x-timeTER.x and "timeST.y-timeTER.y
time.period.overlap <- function(x_start, x_end, y_start, y_end)
{
# Return intersections of time-periods (x_start - x_end) and (y_start - y_end)
x <- seq(x_start, x_end)
y <- seq(y_start, y_end)
# Each result contains at least a row of 0 to stay true to the data-format and avoid NULLs
c(unique( c(x[x %in% y], y[y %in% x]) ), 0)
}
# Make an edge-list of person-to-person WITHIN grp and WITH overlapping years
# as defined by time.period.overlap(). Choose only rows that DO HAVE an overlap
all.edges <-
do.call('rbind',
lapply(1:nrow(df), function(x)
data.frame(
i = df[x, 'ppl.x'],
j = df[x, 'ppl.y'],
grp = df[x, 'grp'],
yr_overlap = time.period.overlap(df[x, 'timeST.x'], df[x, 'timeTER.x'], df[x, 'timeST.y'], df[x, 'timeTER.y'])
)
)
) %>% filter(yr_overlap != 0)
# Note that pairs like edges like pers4->pers2 in group 2 are not in this df
# since they never appeared in that group during the same year!
all.edges[all.edges$i == 'pers4',]
# For each pair i->j within each group, one row exists for each overlapping year
# Group by i, j and group to find the number of years of each pair's overlap to use in the network
el <- all.edges %>% group_by(i, j, grp) %>% summarise(n_yr_overlap = n(), first_overlap = min(yr_overlap))
现在可以将 edge-list el 发送到聚合网络的 igraph,或者您可以使用 all.edges
.
记得根据人们是否有自己的优势(他们在这个输出)。