根据组和时间段从数据框中创建边缘列表

Question

我正在尝试从取决于两件事的数据框创建边缘列表：(1) 属于同一组 (2) 在同一时间段内。一个人可能同时属于多个组。

# read example vectors
ppl <- c("pers1", "pers2","pers3","pers4","pers5","pers2","pers6","pers1")
grp <- c(1,1,1,2,2,2,3,3)
timeST <- c(2005,2005,2010,2012,2014,2007,2008,2008)
timeTER <- c(2010,2007,2018,2014,2015,2010,2020,2020)
# construct example data frame
example.df <- data.frame(ppl, grp, timeST, timeTER)

这是我目前得到的：

#create edge list by groups:
target.df <- example.df %>% select(ppl, grp, timeST, timeTER) %>%
inner_join(., select(., grp, ppl), by = "grp") %>%
rename(ppl1 = ppl.x, ppl2 = ppl.y) %>%
filter(ppl1 != ppl2) %>%
unique %>%
arrange(grp)
target.df <- target.df[, c("ppl1","ppl2","grp","timeST","timeTER")]
# display results:
target.df

但是，我不知道如何将其划分为多个年份，如：当 persX 和 persY 同时在同一组时，两个人之间应该只有边缘。我假设数据需要放入长格式，所以我尝试使用 reshape 和 reshape2，但无法让它反映时间段。

理想情况下，我会每年创建一个边缘列表，然后将其转换为邻接矩阵（这本身没有问题）。这个问题由于每个人都需要出现在每个邻接矩阵中而变得更加复杂，因此如果 pers4 在 2011 年之后不存在，它仍然需要出现在矩阵中，但每个人的值 10 row/column而不是 0 或 1.. 但我想一步一个脚印。

这看起来像这样：

ex.M <- matrix(c(0,1,0,10,1,0,0,10,0,0,0,10,10,10,10,10), nrow=4, ncol=4)
ex.M

如有任何帮助，我们将不胜感激。

提前致谢。

Answer 1

有趣的问题，尽管解决方案更多的是关于管理数据而不是 igraph 本身。在每个组中，您想要列出时间重叠的人。

我们需要一种评估重叠的好方法。

这不是最好的解决方案（对 overlap-function 的 result-format 进行黑客攻击使得无法使用 0 年），但至少它具有教学意义。

library('dplyr')

ppl <- c("pers1", "pers2","pers3","pers4","pers5","pers2","pers6","pers1")
grp <- c(1,1,1,2,2,2,3,3)
timeST <- c(2005,2005,2010,2012,2014,2007,2008,2008)
timeTER <- c(2010,2007,2018,2014,2015,2010,2020,2020)
# construct example data frame
example.df <- data.frame(ppl, grp, timeST, timeTER)

# Just like in your logic, these are peple that overlap within the same group.
df <- example.df %>% inner_join(example.df, by="grp")


# We get a structure with the start and end times for both i and j like:
names(df)



# This function is used to compute overlapping years between the intervals
# that exist between timeST.x-timeTER.x and "timeST.y-timeTER.y

time.period.overlap <- function(x_start, x_end, y_start, y_end)
{
    # Return intersections of time-periods (x_start - x_end) and (y_start - y_end)
    x <- seq(x_start, x_end)
    y <- seq(y_start, y_end)
    
    # Each result contains at least a row of 0 to stay true to the data-format and avoid NULLs
    c(unique( c(x[x %in% y], y[y %in% x]) ),  0)
}



# Make an edge-list of person-to-person WITHIN grp and WITH overlapping years
# as defined by time.period.overlap(). Choose only rows that DO HAVE an overlap
all.edges <-
    do.call('rbind',
            lapply(1:nrow(df), function(x)
                data.frame(
                    i = df[x, 'ppl.x'],
                    j = df[x, 'ppl.y'],
                    grp = df[x, 'grp'],
                    yr_overlap = time.period.overlap(df[x, 'timeST.x'], df[x, 'timeTER.x'], df[x, 'timeST.y'], df[x, 'timeTER.y'])
                )
            )
    ) %>% filter(yr_overlap != 0)

# Note that pairs like edges like pers4->pers2 in group 2 are not in this df
# since they never appeared in that group during the same year!
all.edges[all.edges$i == 'pers4',]
# For each pair i->j within each group, one row exists for each overlapping year

# Group by i, j and group to find the number of years of each pair's overlap to use in the network
el <- all.edges %>% group_by(i, j, grp) %>% summarise(n_yr_overlap = n(), first_overlap = min(yr_overlap))

现在可以将 edge-list el 发送到聚合网络的 igraph，或者您可以使用 all.edges.

对每个给定年份的网络进行子采样

记得根据人们是否有自己的优势（他们在这个输出）。

根据组和时间段从数据框中创建边缘列表

Create edgelist from dataframe depending on groups and time period

r

igraph

reshape

network-analysis