在序列中查找特定模式

Find specific patterns in sequences

我正在使用 R 包 TraMineR 对序列分析进行一些学术研究。

我想找到一个模式,定义为某人在目标公司,然后出去,然后回到目标公司。

(简化)我将状态A定义为目标公司; B为外业公司,C为内业公司。

所以我想做的是找到具有特定模式 A-B-A 或 A-C-A 的序列。

看完这个问题(Strange number of subsequences?)并阅读了用户指南,特别是以下段落:

4.3.3 Subsequences A sequence u is a subsequence of x if all successive elements ui of u appear >in x in the same order, which we simply denote by u x. According to this denition, unshared >states can appear between those common to both sequences u and x. For example, u = S; M is a >subsequence of x = S; U; M; MC.

7.3.2 Finding sequences with a given subsequence The seqpm() function counts the number of sequences that contain a given subsequence and collects their row index numbers. The function returns a list with two elements. The rst element, MTab, is just a table with the number of occurrences of the given subsequence in the data. Note that only one occurrence is counted per sequence, even when the sub-sequence appears more than one time in the sequence. The second element of the list, MIndex, gives the row index numbers of the sequences containing the subsequence. These index numbers may be useful for accessing the concerned sequences (example below). Since it is easier to search a pattern in a character string, the function rst translates the sequence data in this format when using the seqconc function with the TRUE option.

我得出结论,seqpm() 是完成工作所需的函数。

所以我有这样的序列: A-A-A-A-A-B-B-B-B-B-A-A-A-A-A

根据我在 mentiod 来源上找到的子序列定义,我想我可以通过使用找到那种序列:

seqpm(sequence,"ABA")

但这并没有发生。为了找到我需要输入的示例序列

seqpm(sequence,"ABBBBBA")

这对我需要的东西不是很有用。

  1. 所以你们看到我可能遗漏了什么吗?
  2. 如何检索从 A 到 B 再回到 A 的所有序列?
  3. 有没有办法让我找到从 A 到其他地方然后回到 A 的方法?

非常感谢!

seqpm 帮助页面的标题是 "Find substring patterns in sequences",这就是函数的实际作用。它搜索包含给定子字符串(不是子序列)的序列。用户指南中似乎有公式错误。

找到包含给定子序列的序列的解决方案是使用 seqecreate 将状态序列转换为事件序列,然后使用 seqefsubseqeapplysub 函数。我使用 TraMineR 附带的 actcal 数据进行了说明。

library(TraMineR)
data(actcal)
actcal.seq <- seqdef(actcal[,13:24])

## displaying the first state sequences
head(actcal.seq)

## transforming into event sequences
actcal.seqe <- seqecreate(actcal.seq, tevent = "state", use.labels=FALSE)

## displaying the first event sequences
head(actcal.seqe)

## now searching for the subsequences
subs <- seqefsub(actcal.seqe, strsubseq=c("(A)-(D)","(D)-(B)"))
## and identifying the sequences that contain the subsequences
subs.pres <- seqeapplysub(subs, method="presence")
head(subs.pres)

## we can now, for example, count the sequences that contain (A)-(D)
sum(subs.pres[,1])
## or list the sequences that contain (A)-(D)
rownames(subs.pres)[subs.pres[,1]==1]

希望这对您有所帮助。