在循环函数中正确使用 seq_along() 与 unique() 函数？

Question

我正在学习如何使用 R lapply() 函数并将其与其他选项进行基准测试，以生成转换矩阵。

当我对 seq_along() 数据框使用长数值时，lapply() 不起作用。或者问题可能出在 seq_along()，而不是 lapply()。因此，例如，如果如下所示设置 dataTest 数据框，其中 ID 列中的每个数值只有 1 位长，那么底部的可重现代码可以正常工作：

dataTest <- 
    data.frame(
      ID = c(1,1,1,2,2,2,3,3,3),
      Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
      Balance = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
      Flags = c("X00","X01","X00","X01","X02","X02","X02","X01","X01")
    )

正确结果：

> numTransit(dataTest, 1,3)
    X00 X01 X02
X00   1   0   0
X01   0   0   1
X02   0   1   0

但是如果我用下面的 7 位数字值替换上面的 ID 列，它就不再有效了！我在上面的转换矩阵中只给了我 0 个值。

ID = c(1930145,1930145,1930145,1930146,1930146,1930146,1930147,1930147,1930147)

这里是使用 lapply()/seq_along() 来测试以上内容的可重现代码：

# Function to set-up base transition matrix with all 0 values:
  transMat <- function(x){
    df <- data.frame(matrix(0, ncol=length(unique(x$Flags)), nrow=length(unique(x$Flags))))
    row.names(df) <- unique(x$Flags)
    names(df) <- unique(x$Flags)
    return(df)
  }

# Function to populate transition matrix with number of transition events:
numTransit <- function(x, from=1, to=3){
    df <- transMat(x)
    lapply(seq_along(unique(x$ID)), function(i){
      id_from <- as.character(x$Flags[(x$ID == i & x$Period == from)])
      id_to <- as.character(x$Flags[x$ID == i & x$Period == to])
      column <- which(names(df) == id_from)
      row <- which(row.names(df) == id_to)
      df[row, column] <<- df[row, column] + 1
    })
    return(df)
  }

# Now to run the functions:
numTransit(dataTest,1,3)

如果我用 for 循环替换上面的 lapply()/seq_along()，无论 ID 值的长度如何，代码都可以正常运行。我可以post循环代码如果有人喜欢，请告诉我。

Answer 1

问题不在于 lapply() 也不在于 seq_along()，而在于 lapply() 中的 X 参数。

seq_along(x) returns 从 1 到 x.

中的元素数的向量

例如，如果我们有一个包含三个元素的向量：

seq_along(c(534624, 56235, 62))

Returns:

[1] 1 2 3

因此，当您使用 x$ID == i 时，它匹配 x 中的 ID 列，即 1、2 或 3 ，这绝对不是你的情况。

所以你需要使用lapply(unique(x$ID), function(i) ...).

这是完整的代码（我基本上只改变了你的 lapply() 部分）：

输入

dataTest <- 
  data.frame(
    ID = c(1,1,1,2,2,2,3,3,3),
    Period = c(1, 2, 3, 1, 2, 3, 1, 2, 3),
    Balance = c(5, 10, 15, 0, 2, 4, 3, 6, 9),
    Flags = c("X00","X01","X00","X01","X02","X02","X02","X01","X01")
  )

ID = c(1930145,1930145,1930145,1930146,1930146,1930146,1930147,1930147,1930147)

dataTest[, 1] <- ID

dataTest
       ID Period Balance Flags
1 1930145      1       5   X00
2 1930145      2      10   X01
3 1930145      3      15   X00
4 1930146      1       0   X01
5 1930146      2       2   X02
6 1930146      3       4   X02
7 1930147      1       3   X02
8 1930147      2       6   X01
9 1930147      3       9   X01

输出

transMat <- function(x){
  df <- data.frame(matrix(0, ncol=length(unique(x$Flags)), nrow=length(unique(x$Flags))))
  row.names(df) <- unique(x$Flags)
  names(df) <- unique(x$Flags)
  return(df)
}

# Function to populate transition matrix with number of transition events:
numTransit <- function(x, from=1, to=3){
  df <- transMat(x)
  lapply(unique(x$ID), function(i){
    id_from <- as.character(x$Flags[(x$ID == i & x$Period == from)])
    id_to <- as.character(x$Flags[x$ID == i & x$Period == to])
    column <- which(names(df) == id_from)
    row <- which(row.names(df) == id_to)
    df[row, column] <<- df[row, column] + 1
  })
  return(df)
}

numTransit(dataTest,1,3)

    X00 X01 X02
X00   1   0   0
X01   0   0   1
X02   0   1   0

在循环函数中正确使用 seq_along() 与 unique() 函数？

Proper use of seq_along() versus unique() functions within a looping function?

r

lapply

输入

输出