使用简化的字母表创建 seqdef 状态对象的子集

Create subset of seqdef state object with reduced alphabet

假设我们的序列由 5 个不同的 events/states (A-E) 组成,如下所示:

library(TraMineR)
data(actcal)
actcal.seq <- seqdef(actcal, 13:24, alphabet=c("A","B","C","D","E")

现在是否可以创建仅包含事件 A、C 和 E 的 actcal.seq 子集?如果是,那么这是怎么做到的?

澄清:我想提取任何包含 A、C 或 E 的序列。如果其中任何一个包含 B 或 D,则应从返回的序列中删除这些事件。例如,序列 A-A-B-C-C-D-E-E 应返回为 A-A-C-C-E-E。

澄清 2:输入序列应使用 alphabet=c("A","B","C","D","E"),而我要查找的修改后的序列对象应使用 alphabet=c("A","C","E")。下面根据要求提供了更多示例:

"A-B-C-D-E" => "A-C-E"
"A-C-A-E" => "A-C-A-E"
"B-D" => NA or ""
"B-D-B-A-D" => "A"

我将不胜感激关于如何解决这个问题而无需从数据库中重新读取数据子集的任何解决方案。

您可以通过 seqrecode 函数将状态 B 和 D 重新编码为缺失。用于缺失的默认符号是 *。我仅使用 actcal

的前 10 个序列进行说明
data(actcal)
actcal.seq <- seqdef(actcal[1:10,13:24], alphabet=c("A","B","C","D","E"))

## Recode B and D as *, the default  missing symbol 
actcal.rec.seq <- seqrecode(actcal.seq, 
                     recodes = list("*"=c("B","D")), otherwise=NULL)

actcal.seq
#      Sequence               
# 2848 B-B-B-B-B-B-B-B-B-B-B-B
# 1230 D-D-D-D-A-A-A-A-A-A-A-D
# 2468 B-B-B-B-B-B-B-B-B-B-B-B
# 654  C-C-C-C-C-C-C-C-C-B-B-B
# 6946 A-A-A-A-A-A-A-A-A-A-A-A
# 1872 D-B-B-B-B-B-B-B-B-B-B-B
# 2905 D-D-D-D-D-D-D-D-D-D-D-D
# 106  A-A-A-A-A-A-A-A-A-A-A-A
# 5113 A-A-A-A-A-A-A-A-A-A-A-A
# 4503 A-A-A-A-A-A-A-A-A-A-A-A

actcal.rec.seq
#      Sequence               
# 2848 *-*-*-*-*-*-*-*-*-*-*-*
# 1230 *-*-*-*-A-A-A-A-A-A-A-*
# 2468 *-*-*-*-*-*-*-*-*-*-*-*
# 654  C-C-C-C-C-C-C-C-C-*-*-*
# 6946 A-A-A-A-A-A-A-A-A-A-A-A
# 1872 *-*-*-*-*-*-*-*-*-*-*-*
# 2905 *-*-*-*-*-*-*-*-*-*-*-*
# 106  A-A-A-A-A-A-A-A-A-A-A-A
# 5113 A-A-A-A-A-A-A-A-A-A-A-A
# 4503 A-A-A-A-A-A-A-A-A-A-A-A

删除缺失状态

actcal.rec.comp.seq <- seqdef(actcal.rec.seq, 
                          left="DEL", gap="DEL", right="DEL", 
                          missing="*", alphabet=c("A","C","E"))

删除仅包含缺失的序列

(rec.seq <- actcal.rec.comp.seq[!is.na(seqdur(actcal.rec.comp.seq)[,1]),])
#      Sequence               
# 2103 A-A-A-A-A-A-A-A-A-A-A-A
# 3972 C-C-C-C-C-C-C-C-C      
# 5238 C                      
# 4977 C-C-C-C-C-C-C-C-C-C-C-C
# 528  A-A-A-A-A-A-A-A-A-A-A-A

如果您只想要不同连续状态的序列

seqdss(rec.seq)
#      Sequence
# 2103 A       
# 3972 C       
# 5238 C       
# 4977 C       
# 528  A