纵向数据中有超过 3 个观察值的子集案例？

Question

有一组纵向数据，其中在各种波浪中重复收集测量值（参见下面的设置示例。然而，随着这类数据的发展，出现了损耗，一些波浪在研究结束前停止了。但是，我的分析假设每个参与者都有 至少 3 个观察值

我如何只对那些至少有 3 个观察值的 ID（主题）进行子集化？ 我已经研究过类似的问题Whosebug 但它们似乎不适合这个特定问题。

Answer 1

方法一

# set as data table
setDT(df)

# calculate no. of waves per ID
df[, no_of_waves := .N, ID]

# subset
df[no_of_waves >= 3]

# calculate no. of waves per ID
df[, no_of_waves := max(Wave), ID]

# subset
df[no_of_waves >= 3]

Answer 2

使用 base R，你可以试试这个 one-liner。

out <- with(df, df[ID %in% names(which(sapply(split(df, ID), nrow) > 2)), ])

输出

> out
    ID Wave Score
3 1001    0     6
4 1001    1     6
5 1001    2     7

数据

df <- data.frame(
  ID = unlist(mapply(rep, 1000:1001, 2:3)),
  Wave = c(0,1,0,1,2),
  Score = c(5,4,6,6,7)
)