从列表创建稀疏矩阵

create sparse matrix from list

我希望从 list 创建一个 sparse matrix。我可以使用此处介绍的方法创建典型的 matrix

下面是典型 matrix.

的一个可重现的小例子
set.seed(1234)

# determine number of observations in each sample
n.samples <- 20
max.obs   <- 10
obs.per.sample <- sample(0:max.obs, size = n.samples, prob = c(0.70,0.15,0.05,0.03,rep(0.01,7)), replace = TRUE)

# determine size of each observation in a sample
# here obs.size is a list
my.sizes   <- seq(10, 32, 2)
size.probs <- c(0.02,0.04,0.06,0.08,0.10,0.12,0.14,0.16,0.14,0.08,0.04,0.02)
obs.size   <- sapply(obs.per.sample, function(x) sample(my.sizes, size = x, prob = size.probs, replace=TRUE))

# create matrix of observation sizes in all samples
max.samples <- max(lengths(obs.size))
mat <- matrix(c(sapply(obs.size, `[`, 1:max.samples)), nrow = n.samples, byrow = TRUE)
mat[is.na(mat)] <- 0
mat
#      [,1] [,2] [,3]
# [1,]    0    0    0
# [2,]    0    0    0
# [3,]    0    0    0
# [4,]    0    0    0
# [5,]   22   22    0
# [6,]    0    0    0
# [7,]    0    0    0
# [8,]    0    0    0
# [9,]    0    0    0
#[10,]    0    0    0
#[11,]    0    0    0
#[12,]    0    0    0
#[13,]    0    0    0
#[14,]   24   24   26
#[15,]    0    0    0
#[16,]   16    0    0
#[17,]    0    0    0
#[18,]    0    0    0
#[19,]    0    0    0
#[20,]    0    0    0

也许你可以尝试以下基本 R 选项

l <- lengths(obs.size)
mat <- matrix(0,length(obs.size),max(l))
mat[cbind(rep(which(l>0),l[l>0]),sequence(l[l>0]))] <- unlist(obs.size)

其中 non-zero 值的索引以 cbind(rep(which(l>0),l[l>0]),sequence(l[l>0])) 为特征,您只需将 non-zero 值,即 unlist(obs.size) 分配给这些位置。

  • 输出
> mat
      [,1] [,2] [,3]
 [1,]    0    0    0
 [2,]    0    0    0
 [3,]    0    0    0
 [4,]    0    0    0
 [5,]   22   22    0
 [6,]    0    0    0
 [7,]    0    0    0
 [8,]    0    0    0
 [9,]    0    0    0
[10,]    0    0    0
[11,]    0    0    0
[12,]    0    0    0
[13,]    0    0    0
[14,]   24   24   26
[15,]    0    0    0
[16,]   16    0    0
[17,]    0    0    0
[18,]    0    0    0
[19,]    0    0    0
[20,]    0    0    0

如果您需要稀疏矩阵,您可能需要 Matrix 包的帮助,例如,

library(Matrix)

l <- lengths(obs.size)
mat <- sparseMatrix(
  i = rep(which(l > 0), l[l > 0]),
  j = sequence(l[l > 0]),
  x = unlist(obs.size)
)

这样

> mat
16 x 3 sparse Matrix of class "dgCMatrix"

 [1,]  .  .  .
 [2,]  .  .  .
 [3,]  .  .  .
 [4,]  .  .  .
 [5,] 22 22  .
 [6,]  .  .  .
 [7,]  .  .  .
 [8,]  .  .  .
 [9,]  .  .  .
[10,]  .  .  .
[11,]  .  .  .
[12,]  .  .  .
[13,]  .  .  .
[14,] 24 24 26
[15,]  .  .  .
[16,] 16  .  .

我想 obs.size 是您的清单。稀疏矩阵是指包 Matrix 中的 sparseMatrix。您需要提供 i,j 索引以及 non-zero 条目的值。

对于i,是因为行索引:

nonzero = sapply(obs.size,length)
i = rep(1:length(obs.size),nonzero)
i
[1]  5  5 14 14 14 16

j 是列索引,我的大脑现在不能正常工作所以下面的代码可能会 sux:

j = unlist(tapply(i,i,seq_along))

然后制作矩阵:

library(Matrix)
sparseMatrix(i=i,j=j,x=unlist(obs.size))
16 x 3 sparse Matrix of class "dgCMatrix"
              
 [1,]  .  .  .
 [2,]  .  .  .
 [3,]  .  .  .
 [4,]  .  .  .
 [5,] 22 22  .
 [6,]  .  .  .
 [7,]  .  .  .
 [8,]  .  .  .
 [9,]  .  .  .
[10,]  .  .  .
[11,]  .  .  .
[12,]  .  .  .
[13,]  .  .  .
[14,] 24 24 26
[15,]  .  .  .
[16,] 16  .  .