如何使用 ndtv 在 R 中确定动态网络渲染的稳健切片参数?

How can I determine robust slice parameters for dynamic network renderings in R using ndtv?

我努力制作带有时间戳的交易的视觉动态动画,其中每笔交易代表一个人对 artifact/file 的贡献。为此,我使用了 R 包 networkDynamicnetworkndtv

交易具有(与 networkDynamic 包小插图中的示例相反)"real" 时间戳。我想将渲染过程包装在一个

的函数中

我想我已经设法使用 lubridates floor_date 在第一个事件的一周开始时开始第一个片段。我还没有研究最后一个问题(标签),因为不幸的是,我无法为我的数据集确定合适的切片参数。

请在下方找到 RStudio 的可重现示例。该示例包括三个名为 slice.par 的列表,一个有效,两个无效。简单地硬编码(仅)适用于具体示例的参数配置不是我的目标,首先是因为我的真实数据集要大得多(因此 'playing around' 使用参数会花费很多时间),其次是因为我想具有适用于许多不同数据集的功能。

if (!require("pacman")) install.packages("pacman")
library("pacman")
pacman::p_load(network, networkDynamic, ndtv, lubridate)

UtilNumericAsDate <- function(nuUnixTimestamp) {
    return(as.POSIXct(nuUnixTimestamp, origin = "1970-01-01 00:00.00 UTC", tz = "UTC"))
}

UtilDateAsNumeric <- function(oTimestamp) {
    return(as.numeric(as.POSIXct(oTimestamp)))
}

stTransac <- "
'contributorId', 'artifactId', 'weight', 'instantId'
'A', 'a1', '1', '2003-06-01 23:09:40'
'A', 'a2', '1', '2004-02-27 11:48:41'
'A', 'a1', '2', '2006-06-25 20:36:49'
'A', 'a3', '1', '2007-01-28 00:35:31'
'A', 'a3', '2', '2007-04-25 16:03:57'
'A', 'a3', '3', '2007-07-19 19:43:49'
'B', 'a1', '1', '2008-02-06 12:37:56'
'C', 'a3', '1', '2008-04-07 02:27:36'
'C', 'a2', '1', '2008-06-01 02:15:35'
'C', 'a2', '2', '2008-10-05 02:32:45'
'B', 'a1', '2', '2009-06-22 01:57:45'
'C', 'a4', '1', '2009-09-15 02:56:33'
'C', 'a5', '1', '2010-06-30 19:42:25'
'C', 'a6', '1', '2011-06-12 23:58:17'
'B', 'a3', '1', '2013-08-30 19:34:28'
'C', 'a1', '1', '2014-10-23 20:49:54'
'C', 'a1', '2', '2014-10-24 16:46:07'
'A', 'a2', '2', '2015-09-26 16:58:17'
'A', 'a7', '1', '2015-10-04 17:40:12'
'A', 'a8', '1', '2015-12-02 10:55:47'
"

dfTransac <- read.csv(text = stTransac, sep = "," , quote = '\'' , strip.white = TRUE, stringsAsFactors = FALSE)

dfEdges <- unique(dfTransac[,1:2])
veUniqueContributors   <- unique(dfEdges[[1]])
veUniqueArtifacts      <- unique(dfEdges[[2]])
nuNrUniqueContributors <- length(veUniqueContributors)
nuNrUniqueArtifacts    <- length(veUniqueArtifacts)

net <- network.initialize(0, directed = TRUE, bipartite = length(veUniqueContributors))

add.vertices.networkDynamic(net, nuNrUniqueContributors, vertex.pid = veUniqueContributors)
add.vertices.networkDynamic(net, nuNrUniqueArtifacts, vertex.pid = veUniqueArtifacts)

net %v% "vertex.names" <- c(veUniqueContributors, veUniqueArtifacts)
net %v% "vertex.type"  <- c(rep("p", length(veUniqueContributors)), rep("a", length(veUniqueArtifacts)))
net %v% "vertex.col"   <- c(rep("blue", length(veUniqueContributors)), rep("gray", length(veUniqueArtifacts)))
net %v% "vertex.sides" <- c(rep(8, length(veUniqueContributors)), rep(4, length(veUniqueArtifacts)))
net %v% "vertex.rot"   <- c(rep(0, length(veUniqueContributors)), rep(45, length(veUniqueArtifacts)))
net %v% "vertex.lwd"   <- c(rep(1, length(veUniqueContributors)), rep(0, length(veUniqueArtifacts)))
net %v% "vertex.cex"   <- c(rep(2, length(veUniqueContributors)), rep(1, length(veUniqueArtifacts)))
set.network.attribute(net,'vertex.pid','vertex.names')
set.network.attribute(net,'edge.pid','edge.names')

add.edges.networkDynamic(net,
                         tail = get.vertex.id(net, dfEdges[[1]]),
                         head = get.vertex.id(net, dfEdges[[2]]),
                         edge.pid = paste0(dfEdges[[1]], "->", dfEdges[[2]]))

activate.edges(net,
               e = get.edge.id(net, paste0(dfTransac[["contributorId"]], "->", dfTransac[["artifactId"]])),
               at = UtilDateAsNumeric(dfTransac$instantId))

activate.edge.attribute(net,
                        prefix = "weight",
                        value = dfTransac$weight,
                        e = get.edge.id(net, paste0(dfTransac[["contributorId"]], "->", dfTransac[["artifactId"]])),
                        at = UtilDateAsNumeric(dfTransac$instantId))

reconcile.vertex.activity(net = net, mode = "encompass.edges", edge.active.default = FALSE)

nuStart      <- range(get.change.times(net, ignore.inf = FALSE))[1]
nuEnd        <- range(get.change.times(net, ignore.inf = FALSE))[2]

nuWeekStart  <- UtilDateAsNumeric(floor_date(UtilNumericAsDate(nuStart), "week"))
nuWeekEnd    <- UtilDateAsNumeric(ceiling_date(UtilNumericAsDate(nuEnd), "week"))

# This doesn't work: "Monthly" slices, 5 year aggregation
# Error: Attribute 'vertex.sides' had illegal missing values for vertex.sides or was not present in plot.network.default.
slice.par <- list(start = nuWeekStart,
                          end = nuWeekEnd,
                          interval = 1*60*60*24*7*4.5,
                          aggregate.dur = 1*60*60*24*7*52*5,
                          rule = "any")

# This doesn't work either: "Bimonthly" slices, "Bimonthly" aggregation
# Error: Attribute 'weight' had illegal missing values for edge.lwd or was not present in plot.network.default.
slice.par <- list(start = nuWeekStart,
                          end = nuWeekEnd,
                          interval = 1*60*60*24*7*4.5*2,
                          aggregate.dur = 1*60*60*24*7*4.5*2,
                          rule = "any")

# This works: "Bimonthly" slices, 5 year aggregation
slice.par <- list(start = nuWeekStart,
                          end = nuWeekEnd,
                          interval = 1*60*60*24*7*4.5*2,
                          aggregate.dur = 1*60*60*24*7*52*5,
                          rule = "any")

compute.animation(net, animation.mode = "kamadakawai", slice.par = slice.par, default.dist = 10)

render.d3movie(net,
               slice.par = slice.par,
               displaylabels = TRUE,
               output.mode = "htmlWidget",
               usearrows = TRUE,
               vertex.col = 'vertex.col',
               vertex.sides = 'vertex.sides',
               vertex.cex = 'vertex.cex',
               vertex.rot = 'vertex.rot',
               edge.lwd = 'weight',
               render.par = list(tween.frames = 10, show.time = TRUE))

如何从数据集中导出适当的切片参数,以便在不简单地增加聚合持续时间的情况下,渲染过程不会在缺少属性或边缘的单个切片上阻塞?

正如您已经确定的那样,render.d3movie 函数中存在错误。它正在尝试查找 'empty' 切片(不包含活动顶点的时间范围)的值,请参阅 bug report at Github。 (我实际上无法用你上面的代码重现错误,但这绝对是一个错误,感谢报告)

How can I derive proper slicing parameters from the data set so that the rendering process does not choke on individual slices that miss attributes or edges without simply increasing the aggregation duration?

在错误修复之前(希望很快),您可以

a) 使用 render.animation 代替

b) 选择您的切片参数以确保在网络没有活动顶点的地方没有切片。您可以使用 timeline 函数来查看切片将落在何处。例如,要显示节点(蓝色)和边缘(紫色)的 activity 拼写以及切片箱(垂直灰色条):

timeline(net,slice.par=slice.par,main='timeline plot of activity spells')

c) 最好的解决方案可能是调整顶点 activity 以确保在每个时间片中始终有一个顶点处于活动状态。在这种情况下,reconcile.vertex.activity 为一些顶点分配了非常短的持续时间,因为它们只包含持续时间非常短的边。使用不同的规则可能会避免这种情况,或者将顶点设置为在它们出现后始终处于活动状态(如果这对您的数据有意义)。

一些其他注意事项:

您可能还需要将 slice.par$rule 值设置为 earliest 而不是 any 以便在对动态 weight 属性进行装箱时遇到多个可能的值时边缘它会知道选择哪一个。

顺便说一句,我认为使用 networkDynamic 效用函数并传入 stTransac 构建网络可能有更紧凑的方法,加载大型数据集时可能会更快.