R矩阵值回收?

R matrix values recycling?

我正在关注有关 R 中基于代理的建模的 YouTube 教程 https://www.youtube.com/watch?v=uAeSykgXnhg

我已经使用自己的变量名复制了代码(这有助于我更好地理解导师的代码)。目的是追踪人们在接触他人时如何感染 covid-19。并非每次接触都会导致感染。在连续的模型运行中,感染人数=人口规模,未感染人数应为0。这是我复制的代码:

# define first agent
agents <- data.frame(agent_no = 1,
                     state = "e",
                     mixing = runif(1,0,1))

# specify agent population
pop_size <- 100

# fill agent data
for(i in 2:pop_size){
  agent <- data.frame(agent_no = i,
                      state = "s",
                      mixing = runif(1,0,1))
  agents <- rbind(agents, agent)
}

# specify number of model runs
n_times <- 10

# initialise output matrix 
out <- matrix(0, ncol = 2, nrow = n_times)

# run simple agent-based model
for(k in 1:n_times){
  for(i in 1:pop_size){
    # likelihood to meet others
    likelihood <- agents$mixing[i]
    # how many agents will they meet (integer). Add 1 to make sure everybody meets somebody
    connect_with <- round(likelihood * 3, 0) + 1 
    # which agents will they probably meet (list of agents)
    which_others <- sample(1:pop_size, 
                           connect_with, 
                           replace = T, 
                           prob = agents$mixing)
    for(j in 1:length(which_others)){
      contacts <- agents[which_others[j],]
      # if exposed, change state
      if(contacts$state == "e"){
        urand <- runif(1,0,1)
        # control probability of state change
        if(urand < 0.5){
          agents$state[i] <- "e"
        }
      }
    }
  }
  out[k,] <- table(agents$state)
}

查看输出时,一旦每个人都被感染(第一列),未感染人数(第二列)应该为 0,但我得到 100,我怀疑这是由于回收。

     [,1] [,2]
 [1,]   12   88
 [2,]   33   67
 [3,]   69   31
 [4,]   86   14
 [5,]   92    8
 [6,]   95    5
 [7,]   97    3
 [8,]   98    2
 [9,]   99    1
[10,]  100  100

我 运行 进行一些诊断以查看发生了什么:

table(agents$state)
      e 
    100 

agents[agents$state == "s",]

    [1] agent_no state    mixing  
    <0 rows> (or 0-length row.names)

我认为 0 长度 row.names 是我的问题。结果应该是这样的:

     [,1] [,2]
 [1,]   12   88
 [2,]   33   67
 [3,]   69   31
 [4,]   86   14
 [5,]   92    8
 [6,]   95    5
 [7,]   97    3
 [8,]   98    2
 [9,]   99    1
[10,]  100    0

有人可以解释我做错了什么吗?非常感谢。

我增加了n_times到10000,但找不到回收的证据。虽然这并不意味着它没有发生,但不幸的是,如果没有明确的设置,我们将无法重现该问题。所以我这里的建议是未经证实的。

选项 1

鉴于您发现了一个以所有 agents$state == "e" 结尾的场景,那么我将建议一个技巧,该技巧总能找到至少一个 "s"(实际上,每个值中的一个了解):

  out[k,] <- table(c("e", "s", agents$state)) - 1

我假设唯一可能的值是 "e""s";如果还有其他,此技术完全依赖于我们确保每个可能的值至少被看到一次,然后递减所有值的前提。由于我们为每个可能的值“添加一个观察值”,因此从 table 中减去一个是安全的。有了这个技巧,你的支票应该是

table(agents$state)
#       e 
#     100 
table(c("e", "s", agents$state))
#       e       s 
#     101       1
table(c("e", "s", agents$state)) - 1
#       e       s 
#     100       0

因此回收不应成为一个因素。

选项 2

另一种更强大的技术(即不需要包括所有可能的值)是强制长度,假设我们确定它应该是什么(我想我们在这里做的):

z <- table(agents$state)
z
#   s 
# 100 
length(z) <- 2
z
#   s     
# 100  NA 

由于您“知道”长度应始终为 2,因此您可以在其中硬编码 2

选项 3

这种方法更稳健一点,因为你不需要知道绝对长度,它们都会被扩展到最长的长度return。

首先,可重现的样本数据:

set.seed(2021)
agents <- data.frame(agent_no = 1,
                     state = "e",
                     mixing = runif(1,0,1))
# specify agent population
pop_size <- 100
# fill agent data
for(i in 2:pop_size){
  agent <- data.frame(agent_no = i,
                      state = "s",
                      mixing = runif(1,0,1))
  agents <- rbind(agents, agent)
}
head(agents)
#   agent_no state    mixing
# 1        1     e 0.4512674
# 2        2     s 0.7837798
# 3        3     s 0.7096822
# 4        4     s 0.3817443
# 5        5     s 0.6363238
# 6        6     s 0.7013460

替换你的 for 循环:

for (k in 1:n_times) {
}

out <- lapply(seq_len(n_times), function(k) {
  for(i in 1:pop_size){
    # likelihood to meet others
    likelihood <- agents$mixing[i]
    # how many agents will they meet (integer). Add 1 to make sure everybody meets somebody
    connect_with <- round(likelihood * 3, 0) + 1 
    # which agents will they probably meet (list of agents)
    which_others <- sample(1:pop_size, 
                           connect_with, 
                           replace = T, 
                           prob = agents$mixing)
    for(j in 1:length(which_others)){
      contacts <- agents[which_others[j],]
      # if exposed, change state
      if(contacts$state == "e"){
        urand <- runif(1,0,1)
        # control probability of state change
        if(urand < 0.5){
          agents$state[i] <- "e"
        }
      }
    }
  }
  table(agents$state)
})

此时,您有一个列表,可能是长度为 2 的向量:

out[1:3]
# [[1]]
#  e  s 
#  1 99 
# [[2]]
#  e  s 
#  2 98 
# [[3]]
#  e  s 
#  3 97 

注意,我们可以用

确定所有的长度
lengths(out)
#  [1] 2 2 2 2 2 2 2 2 2 2

类似于我们强制向量长度的选项 2,我们可以在这里做同样的事情:

maxlen <- max(lengths(out))
out <- lapply(out, `length<-`, maxlen)
## or more verbosely
out <- lapply(out, function(vec) { length(vec) <- maxlen; vec; })

您可以确认它们与 table(lengths(out)) 的长度相同,应该是 2n_times 的 10。

从这里,我们可以将所有这些向量组合成一个矩阵

out <- do.call(rbind, out)
out
#        e  s
#  [1,]  1 99
#  [2,]  2 98
#  [3,]  3 97
#  [4,]  2 98
#  [5,]  1 99
#  [6,] 20 80
#  [7,] 12 88
#  [8,]  1 99
#  [9,]  2 98
# [10,]  1 99