在不破坏数据的情况下在 Stata 中重新编码分类变量的级别
Recoding levels of categorical variable in Stata without ruining data
我有这个变量,它采用这些值:
tab expenditure
Q11 | Freq. Percent Cum.
--------------------------------+-----------------------------------
Afs 2500-5000 | 24 3.84 3.84
Afs 5000-7500 | 89 14.24 18.08
Afs 7500-10000 | 235 37.60 55.68
I don't know / refuse to answer | 9 1.44 57.12
Less than Afs 2500 | 5 0.80 57.92
More than Afs 10000 | 263 42.08 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
我想更改顺序,所以类别不是按字母顺序排列的。我尝试使用
label define expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer", replace
我也试过使用
recode expenditure (1 = 5) (2 = 1) (3 = 2) (4 = 3) (5 = 6) (6 = 4)
然而,这两种方法都只是改变了标签,而不是基础数据,现在数据全乱了(注意频率的变化,现在“超过 Afs 10000”类别只有 24 个观测值,而不是 263 个和以前一样)。
tab expenditure
Q11 | Freq. Percent Cum.
--------------------------------+-----------------------------------
Less than Afs 2500 | 89 14.24 14.24
Afs 2500-5000 | 235 37.60 51.84
Afs 5000-7500 | 9 1.44 53.28
Afs 7500-10000 | 263 42.08 95.36
More than Afs 10000 | 24 3.84 99.20
I don't know / refuse to answer | 5 0.80 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
这是怎么回事?我该怎么做才能在不影响我的基础数据的情况下改变它?
如果您的 expenditure
变量是字符串,您可以像这样使用 label define
和 encode
:
. clear
. input str31 expenditure int freq
expenditure freq
1. "Afs 2500-5000" 24
2. "Afs 5000-7500" 89
3. "Afs 7500-10000" 235
4. "I don't know / refuse to answer" 9
5. "Less than Afs 2500" 5
6. "More than Afs 10000" 263
7. end
. label def expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer"
. encode expenditure, gen(expenditure2) label(expenditure)
. label var expenditure2 "expenditure"
. tab expenditure2 [fw=freq]
expenditure | Freq. Percent Cum.
--------------------------------+-----------------------------------
Less than Afs 2500 | 5 0.80 0.80
Afs 2500-5000 | 24 3.84 4.64
Afs 5000-7500 | 89 14.24 18.88
Afs 7500-10000 | 235 37.60 56.48
More than Afs 10000 | 263 42.08 98.56
I don't know / refuse to answer | 9 1.44 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
不过,您的变量似乎是数字,所以就好像您这样做了:
. clear
. input byte expenditure int freq
expend~e freq
1. 1 24
2. 2 89
3. 3 235
4. 4 9
5. 5 5
6. 6 263
7. end
. label def expenditure 5 "Less than Afs 2500" 1 "Afs 2500-5000" 2 "Afs 5000-7500" 3 "Afs 7500-10000" 6 "More than Afs 10000" 4 "I don't know / refuse to answer"
. label val expenditure expenditure
现在的问题是您需要重新定义值标签并应用 recode
。
. recode expenditure 5=1 1=2 2=3 3=4 4=6 6=5 5=6
(6 changes made to expenditure)
. tab expenditure [fw=freq]
expenditure | Freq. Percent Cum.
--------------------------------+-----------------------------------
Afs 2500-5000 | 5 0.80 0.80
Afs 5000-7500 | 24 3.84 4.64
Afs 7500-10000 | 89 14.24 18.88
I don't know / refuse to answer | 235 37.60 56.48
Less than Afs 2500 | 263 42.08 98.56
More than Afs 10000 | 9 1.44 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
. label def expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer", modify
. tab expenditure [fw=freq]
expenditure | Freq. Percent Cum.
--------------------------------+-----------------------------------
Less than Afs 2500 | 5 0.80 0.80
Afs 2500-5000 | 24 3.84 4.64
Afs 5000-7500 | 89 14.24 18.88
Afs 7500-10000 | 235 37.60 56.48
More than Afs 10000 | 263 42.08 98.56
I don't know / refuse to answer | 9 1.44 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
我想通了。
label define order2 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer"
encode expenditure, gen(expenditure2) label(order2)
在不改变数据的情况下完成了任务。
我有这个变量,它采用这些值:
tab expenditure
Q11 | Freq. Percent Cum.
--------------------------------+-----------------------------------
Afs 2500-5000 | 24 3.84 3.84
Afs 5000-7500 | 89 14.24 18.08
Afs 7500-10000 | 235 37.60 55.68
I don't know / refuse to answer | 9 1.44 57.12
Less than Afs 2500 | 5 0.80 57.92
More than Afs 10000 | 263 42.08 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
我想更改顺序,所以类别不是按字母顺序排列的。我尝试使用
label define expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer", replace
我也试过使用
recode expenditure (1 = 5) (2 = 1) (3 = 2) (4 = 3) (5 = 6) (6 = 4)
然而,这两种方法都只是改变了标签,而不是基础数据,现在数据全乱了(注意频率的变化,现在“超过 Afs 10000”类别只有 24 个观测值,而不是 263 个和以前一样)。
tab expenditure
Q11 | Freq. Percent Cum.
--------------------------------+-----------------------------------
Less than Afs 2500 | 89 14.24 14.24
Afs 2500-5000 | 235 37.60 51.84
Afs 5000-7500 | 9 1.44 53.28
Afs 7500-10000 | 263 42.08 95.36
More than Afs 10000 | 24 3.84 99.20
I don't know / refuse to answer | 5 0.80 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
这是怎么回事?我该怎么做才能在不影响我的基础数据的情况下改变它?
如果您的 expenditure
变量是字符串,您可以像这样使用 label define
和 encode
:
. clear
. input str31 expenditure int freq
expenditure freq
1. "Afs 2500-5000" 24
2. "Afs 5000-7500" 89
3. "Afs 7500-10000" 235
4. "I don't know / refuse to answer" 9
5. "Less than Afs 2500" 5
6. "More than Afs 10000" 263
7. end
. label def expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer"
. encode expenditure, gen(expenditure2) label(expenditure)
. label var expenditure2 "expenditure"
. tab expenditure2 [fw=freq]
expenditure | Freq. Percent Cum.
--------------------------------+-----------------------------------
Less than Afs 2500 | 5 0.80 0.80
Afs 2500-5000 | 24 3.84 4.64
Afs 5000-7500 | 89 14.24 18.88
Afs 7500-10000 | 235 37.60 56.48
More than Afs 10000 | 263 42.08 98.56
I don't know / refuse to answer | 9 1.44 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
不过,您的变量似乎是数字,所以就好像您这样做了:
. clear
. input byte expenditure int freq
expend~e freq
1. 1 24
2. 2 89
3. 3 235
4. 4 9
5. 5 5
6. 6 263
7. end
. label def expenditure 5 "Less than Afs 2500" 1 "Afs 2500-5000" 2 "Afs 5000-7500" 3 "Afs 7500-10000" 6 "More than Afs 10000" 4 "I don't know / refuse to answer"
. label val expenditure expenditure
现在的问题是您需要重新定义值标签并应用 recode
。
. recode expenditure 5=1 1=2 2=3 3=4 4=6 6=5 5=6
(6 changes made to expenditure)
. tab expenditure [fw=freq]
expenditure | Freq. Percent Cum.
--------------------------------+-----------------------------------
Afs 2500-5000 | 5 0.80 0.80
Afs 5000-7500 | 24 3.84 4.64
Afs 7500-10000 | 89 14.24 18.88
I don't know / refuse to answer | 235 37.60 56.48
Less than Afs 2500 | 263 42.08 98.56
More than Afs 10000 | 9 1.44 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
. label def expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer", modify
. tab expenditure [fw=freq]
expenditure | Freq. Percent Cum.
--------------------------------+-----------------------------------
Less than Afs 2500 | 5 0.80 0.80
Afs 2500-5000 | 24 3.84 4.64
Afs 5000-7500 | 89 14.24 18.88
Afs 7500-10000 | 235 37.60 56.48
More than Afs 10000 | 263 42.08 98.56
I don't know / refuse to answer | 9 1.44 100.00
--------------------------------+-----------------------------------
Total | 625 100.00
我想通了。
label define order2 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer"
encode expenditure, gen(expenditure2) label(order2)
在不改变数据的情况下完成了任务。