SPSS - 如何识别一行 120 个变量中的第一个和最后一个可用测量值

SPSS - How to identify First and Last available measurements in a row of 120 variables

我已经检查了一个关于它的先前post(SPSS Last available measurement in a row of variables),但我仍然怀疑如何更有效地执行此任务。我有一个包含 9800 万行和 120 个变量的数据集(每个月一个,从 January/2005 到 December/2014)。 我需要为数据集中的每个观察值确定一行变量中的第一个和最后一个有效(非缺失)测量值。 数据集如下所示:

v1 v2 ... v120
1 2 ... 5
. 2 ... 5
3 1 ...

我已经尝试过使用循环的版本(在这个主题上也有建议:SPSS Last available measurement in a row of variables)。我使用了下面的语法,但它们不起作用。我一直收到错误消息...很可能是因为我没有理解其中的所有步骤,所以我误用了它。

DEFINE LAST_VALID ()    
!DO !@ = 1 !TO 120 .    
!LET !a = !CONCAT("v", !@) .    
COMPUTE LAST_VALID = !a .    
!DOEND .    
!ENDDEFINE.    
LAST_VALID .     
EXECUTE.    

错误消息(其中一些):

Error # 4382 in column 1024. Text: (End of Command) >An equals sign was >not found when expected after a target variable in a >COMPUTE command. >Execution of this command stops.

Warning # 231 >The depth of macro nesting has reached the current limit. >To increase the >limit, use SET MNEST. To check the limit use the SHOW >command.

Warning # 210 in column 9. Text: !ERROR_MACRO >A macro symbol is invalid >in this context. >The symbol will be treated as an invalid special >character.

Error # 4285 in column 9. Text: !ERROR_MACRO >Incorrect variable name: >either the name is more than 64 characters, or it is >not defined by a >previous command. >Execution of this command stops.

DEFINE FIRST_VALID ()    
!DO !@ = 1 !TO 120 .    
!LET !a = !CONCAT("v", !@) .    
LOOP IF MISSING (FIRST_VALID) = 1.    
COMPUTE FIRST_VALID = !a .    
END LOOP IF FIRST_VALID > 0.    
!DOEND .    
!ENDDEFINE.    
FIRST_VALID.    
EXECUTE.    

错误消息(其中一些):

Warning # 231 >The depth of macro nesting has reached the current limit. >To increase the >limit, use SET MNEST. To check the limit use the SHOW >command.

Warning # 210 in column 18. Text: !ERROR_MACRO >A macro symbol is invalid >in this context. >The symbol will be treated as an invalid special >character.

Error # 4007 in column 18. Text: !ERROR_MACRO >The expression is >incomplete. Check for missing operands, invalid operators, >unmatched >parentheses or excessive string length. >Execution of this command stops.

Error # 4846 in column 18. Text: !ERROR_MACRO >The LOOP command contains >unrecognized text after the end of the IF clause.

Warning # 210 in column 9. Text: !ERROR_MACRO >A macro symbol is invalid >in this context. >The symbol will be treated as an invalid special >character.

Error # 4285 in column 9. Text: !ERROR_MACRO >Incorrect variable name: >either the name is more than 64 characters, or it is >not defined by a >previous command. >Execution of this command stops.

Error # 4014 in column 13. Text: !ERROR_MACRO >SPSS Statistics was >expecting an expression but encountered the end of the >command. Check the >expression for omitted or extra operands, operators, and >parentheses. >Execution of this command stops.

Error # 4045. Command name: END LOOP >The END LOOP command does not follow >an unclosed LOOP command. Maybe the LOOP >command was not recognized >because of an error. Use the level-of-control >shown to the left of the >SPSS Statistics commands to determine the range of >LOOPs and DO IFs.*

我可能忘记了调整我的语法,但我不知道是什么。 如果太明显,我深表歉意...

您的描述和代码片段都不清楚您想要什么。最后一个实际值或该值的索引位置。例如。如果最后一个有效值在 V110 中并且等于记录 100 万的 5,你想知道“110”还是“5”。

这是一个使用 DO REPEAT 的简单方法,它也将 return。在我的示例中,LastVal 是“5”,LastId 是 110。

DO REPEAT V = V1 TO V120 /#i = 1 TO 120.
  DO IF NOT MISSING(V).
    COMPUTE LastVal = V.
    COMPUTE LastId = #i.
  END IF.
END REPEAT.

要还 return 第一个索引和值,您可以在 DO REPEAT.

中执行第二个 DO IF
NUMERIC FirstVal FirstId.
DO REPEAT V = V1 TO V120 /#i = 1 TO 120.
  DO IF NOT MISSING(V).
    COMPUTE LastVal = V.
    COMPUTE LastId = #i.
    DO IF MISSING(FirstVal).
      COMPUTE FirstVal = V.
      COMPUTE FirstId = #i.
    END IF.
  END IF.
END REPEAT.

有 9800 万条记录,这可能需要一段时间 - 特别是如果不在服务器上。您可以使用 VARSTOCASES 进行试验,默认情况下它会丢弃丢失的数据。但这些会一直持续到完成。


您可以通过使用 LOOP 来提高效率,达到相同的效果,但在遇到第一个有效值时中断。因此,对于第一个有效值,您可以执行此操作。

VECTOR V = V1 TO V120.
LOOP #i = 1 TO 120.
END LOOP IF NOT MISSING(V(#i)).
COMPUTE FirstVal = V(#i).
COMPUTE FirstId = #i.

对于最后的值,您只需反转循环即可。

VECTOR V = V1 TO V120.
LOOP #i = 120 TO 1 BY -1.
END LOOP IF NOT MISSING(V(#i)).
COMPUTE LastVal = V(#i).
COMPUTE LastId = #i.