SPSS 按行分组并将字符串连接成一个变量
SPSS group by rows and concatenate string into one variable
我正在尝试 使用 SPSS 语法的自定义格式。带值标签的数据集包含一个或多个变量标签。
但是,现在我想将每个变量的值标签连接成一个字符串。例如,对于变量 SEX
,将行 F
/Female
和 M
/Male
组合或分组为一个变量 F=Female;M=Male;
。我已经使用 Compute CodeValueLabel = concat(Code,'=',ValueLabel).
将代码和标签连接到一个新变量中
所以源数据集的起点是这样的:
+--------------+------+----------------+------------------+
| VarName | Code | ValueLabel | CodeValueLabel |
+--------------+------+----------------+------------------+
| SEX | F | Female | F=Female |
| SEX | M | Male | M=Male |
| ICFORM | 1 | Yes | 1=Yes |
| LIMIT_DETECT | 0 | Too low | 0=Too low |
| LIMIT_DETECT | 1 | Normal | 1=Normal |
| LIMIT_DETECT | 2 | Too high | 2=Too high |
| LIMIT_DETECT | 9 | Not applicable | 9=Not applicable |
+--------------+------+----------------+------------------+
目标是获得这样的数据集:
+--------------+-------------------------------------------------+
| VarName | group_and_concatenate |
+--------------+-------------------------------------------------+
| SEX | F=Female;M=Male; |
| ICFORM | 1=Yes; |
| LIMIT_DETECT | 0=Too low;1=Normal;2=Too high;9=Not applicable; |
+--------------+-------------------------------------------------+
我尝试使用 CASESTOVARS 但它会创建单独的变量,因此多个变量不仅仅是一个字符串变量。我开始怀疑我 运行 超出了 SPSS 的功能极限。虽然也许可以使用一些 AGGREGATE
或 OMS
诡计,但关于如何做到这一点有什么想法吗?
首先,我在这里重新创建您的示例以进行演示:
data list list/varName CodeValueLabel (2a30).
begin data
"SEX" "F=Female"
"SEX" "M=Male"
"ICFORM" "1=Yes"
"LIMIT_DETECT" "0=Too low"
"LIMIT_DETECT" "1=Normal"
"LIMIT_DETECT" "2=Too high"
"LIMIT_DETECT" "9=Not applicable"
end data.
开始工作:
* sorting to make sure all labels are bunched together.
sort cases by varName CodeValueLabel.
string combineall (a300).
* adding ";" .
compute combineall=concat(rtrim(CodeValueLabel), ";").
* if this is the same varname as last row, attach the two together.
if $casenum>1 and varName=lag(varName)
combineall=concat(rtrim(lag(combineall)), " ", rtrim(combineall)).
exe.
*now to select only relevant lines - first I identify them.
match files /file=* /last=selectthis /by varName.
*now we can delete the rest.
select if selectthis=1.
exe.
注意:使 combineall
足够宽以包含填充最多的变量的所有值。
我正在尝试
但是,现在我想将每个变量的值标签连接成一个字符串。例如,对于变量 SEX
,将行 F
/Female
和 M
/Male
组合或分组为一个变量 F=Female;M=Male;
。我已经使用 Compute CodeValueLabel = concat(Code,'=',ValueLabel).
将代码和标签连接到一个新变量中
所以源数据集的起点是这样的:
+--------------+------+----------------+------------------+
| VarName | Code | ValueLabel | CodeValueLabel |
+--------------+------+----------------+------------------+
| SEX | F | Female | F=Female |
| SEX | M | Male | M=Male |
| ICFORM | 1 | Yes | 1=Yes |
| LIMIT_DETECT | 0 | Too low | 0=Too low |
| LIMIT_DETECT | 1 | Normal | 1=Normal |
| LIMIT_DETECT | 2 | Too high | 2=Too high |
| LIMIT_DETECT | 9 | Not applicable | 9=Not applicable |
+--------------+------+----------------+------------------+
目标是获得这样的数据集:
+--------------+-------------------------------------------------+
| VarName | group_and_concatenate |
+--------------+-------------------------------------------------+
| SEX | F=Female;M=Male; |
| ICFORM | 1=Yes; |
| LIMIT_DETECT | 0=Too low;1=Normal;2=Too high;9=Not applicable; |
+--------------+-------------------------------------------------+
我尝试使用 CASESTOVARS 但它会创建单独的变量,因此多个变量不仅仅是一个字符串变量。我开始怀疑我 运行 超出了 SPSS 的功能极限。虽然也许可以使用一些 AGGREGATE
或 OMS
诡计,但关于如何做到这一点有什么想法吗?
首先,我在这里重新创建您的示例以进行演示:
data list list/varName CodeValueLabel (2a30).
begin data
"SEX" "F=Female"
"SEX" "M=Male"
"ICFORM" "1=Yes"
"LIMIT_DETECT" "0=Too low"
"LIMIT_DETECT" "1=Normal"
"LIMIT_DETECT" "2=Too high"
"LIMIT_DETECT" "9=Not applicable"
end data.
开始工作:
* sorting to make sure all labels are bunched together.
sort cases by varName CodeValueLabel.
string combineall (a300).
* adding ";" .
compute combineall=concat(rtrim(CodeValueLabel), ";").
* if this is the same varname as last row, attach the two together.
if $casenum>1 and varName=lag(varName)
combineall=concat(rtrim(lag(combineall)), " ", rtrim(combineall)).
exe.
*now to select only relevant lines - first I identify them.
match files /file=* /last=selectthis /by varName.
*now we can delete the rest.
select if selectthis=1.
exe.
注意:使 combineall
足够宽以包含填充最多的变量的所有值。