SPSS 语法 - 识别重复的响应并系统地识别要保留的案例

Question

我在 SPSS 中有大量调查数据，其中大约 15% 的受访者不止一次回答了调查（这不是故意的）。我已经制定了一个系统的方法来确定要保留哪些情况，但我不确定如何编写循环来执行此任务。

我的变量是：

ID：每个人的唯一标识符（部分重复提交）
调查完成：0/1（调查是否完成）
重复：0/1（他们是否提交了不止一份调查）
PrimaryFirst：0/1（标识首次提交）
MatchSequence：整数（表示调查提交编号的数值）
日期：提交日期
keep：0/1（记录是否被保留的尚未创建的指标）

这是我的数据：

ID  SurveyComplete  Duplicate  PrimaryFirst  MatchSequence    Date   keep
123       1             1            1             1        07162015  .
123       1             1            0             2        07182015  .
456       0             1            1             1        07152015  .
456       1             1            0             2        07192015  .
789       0             1            1             1        07112015  .
789       0             1            0             2        07182015  .
789       0             1            0             3        07212015  .
012       1             0            1             1        07122015  .

理论上，我想按以下顺序确定：

IF Primary = 1 AND SurveyComplete = 1 THEN keep = 1。此 ID 的其他提交保持 = 0。
ELSE IF Primary = 0 AND SurveyComplete = 1 THEN keep = 1。此 ID 的其他提交保持 = 0。
ELSE（其中所有回复的 SurveyComplete = 0）保留最近的提交。

这是生成的保留列：

ID  SurveyComplete  Duplicate  PrimaryFirst  MatchSequence    Date   keep
123       1             1            1             1        07162015  1
123       1             1            0             2        07182015  0
456       0             1            1             1        07152015  0
456       1             1            0             2        07192015  1
789       0             1            1             1        07112015  0
789       0             1            0             2        07182015  0
789       0             1            0             3        07212015  1
012       1             0            1             1        07122015  1

理想情况下，我希望能够在没有插件的情况下使用 SPSS 语法完成此操作，因为我的工作不太适合附加软件。非常感谢可以提供的任何帮助！

Answer 1

在每一步之后，聚合函数都会为每个 ID 确定是否已经做出决定。已经决定的ID将被淘汰出局，未决定的ID进入下一步：

* creating fake data to play around with.
* note I added an extra line for ID=456 to demonstrate choice between multiple non-primary lines.

DATA LIST list (", ") / ID SurveyComplete Duplicate PrimaryFirst MatchSequence Date.
        begin data
        123, 1, 1, 1, 1, 7162015
        123, 1, 1, 0, 2, 7182015
        456, 0, 1, 1, 1, 7152015
        456, 1, 1, 0, 2, 7192015
        456, 1, 1, 0, 3, 7192015
        789, 0, 1, 1, 1, 7112015
        789, 0, 1, 0, 2, 7182015
        789, 0, 1, 0, 3, 7212015
        12, 1, 0, 1, 1, 7122015
        end data.
        execute.

* now starting work on defining the KEEP variable.

        if (PrimaryFirst = 1 AND SurveyComplete = 1) keep=1.
        if (PrimaryFirst = 0 AND SurveyComplete = 1) NotPrimarySeq=MatchSequence.
        aggregate /outfile=* mode=addvariables /break=ID /decided=max(keep)/NotPrimarySeq_min=min(NotPrimarySeq).

        if missing(decided) and (PrimaryFirst = 0 AND SurveyComplete = 1) keep=(NotPrimarySeq=NotPrimarySeq_min).
        aggregate/outfile=* mode=addvariables overwritevars=yes /break=ID/decided=max(keep)/Date_max=MAX(Date).

        if missing(decided) keep=(date=date_max).
        recode keep (miss=0).
        execute.

SPSS 语法 - 识别重复的响应并系统地识别要保留的案例

SPSS Syntax - Identify duplicate responses and systematically identify cases to keep

spss

data-management