在不破坏数据的情况下在 Stata 中重新编码分类变量的级别

Recoding levels of categorical variable in Stata without ruining data

我有这个变量,它采用这些值:

     tab expenditure
   
                            Q11 |      Freq.     Percent        Cum.
--------------------------------+-----------------------------------
                  Afs 2500-5000 |         24        3.84        3.84
                  Afs 5000-7500 |         89       14.24       18.08
                 Afs 7500-10000 |        235       37.60       55.68
I don't know / refuse to answer |          9        1.44       57.12
             Less than Afs 2500 |          5        0.80       57.92
            More than Afs 10000 |        263       42.08      100.00
--------------------------------+-----------------------------------
                          Total |        625      100.00

我想更改顺序,所以类别不是按字母顺序排列的。我尝试使用

label define expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer", replace

我也试过使用

recode expenditure (1 = 5) (2 = 1) (3 = 2) (4 = 3) (5 = 6) (6 = 4)

然而,这两种方法都只是改变了标签,而不是基础数据,现在数据全乱了(注意频率的变化,现在“超过 Afs 10000”类别只有 24 个观测值,而不是 263 个和以前一样)。

tab expenditure

                            Q11 |      Freq.     Percent        Cum.
--------------------------------+-----------------------------------
             Less than Afs 2500 |         89       14.24       14.24
                  Afs 2500-5000 |        235       37.60       51.84
                  Afs 5000-7500 |          9        1.44       53.28
                 Afs 7500-10000 |        263       42.08       95.36
            More than Afs 10000 |         24        3.84       99.20
I don't know / refuse to answer |          5        0.80      100.00
--------------------------------+-----------------------------------
                          Total |        625      100.00

这是怎么回事?我该怎么做才能在不影响我的基础数据的情况下改变它?

如果您的 expenditure 变量是字符串,您可以像这样使用 label defineencode

. clear 

. input str31 expenditure int freq 

                         expenditure      freq
  1.                   "Afs 2500-5000"        24        
  2.                   "Afs 5000-7500"        89      
  3.                  "Afs 7500-10000"       235      
  4. "I don't know / refuse to answer"         9        
  5.              "Less than Afs 2500"         5        
  6.             "More than Afs 10000"       263   
  7. end 

. label def expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer"


. encode expenditure, gen(expenditure2) label(expenditure)


. label var expenditure2 "expenditure"


. tab expenditure2 [fw=freq]

                    expenditure |      Freq.     Percent        Cum.
--------------------------------+-----------------------------------
             Less than Afs 2500 |          5        0.80        0.80
                  Afs 2500-5000 |         24        3.84        4.64
                  Afs 5000-7500 |         89       14.24       18.88
                 Afs 7500-10000 |        235       37.60       56.48
            More than Afs 10000 |        263       42.08       98.56
I don't know / refuse to answer |          9        1.44      100.00
--------------------------------+-----------------------------------
                          Total |        625      100.00

不过,您的变量似乎是数字,所以就好像您这样做了:

. clear 

. input byte expenditure int freq 

     expend~e      freq
  1. 1                       24        
  2. 2                       89      
  3. 3                      235      
  4. 4                        9        
  5. 5                        5        
  6. 6                      263   
  7. end 

. label def expenditure 5 "Less than Afs 2500" 1 "Afs 2500-5000" 2 "Afs 5000-7500" 3 "Afs 7500-10000" 6 "More than Afs 10000" 4 "I don't know / refuse to answer"

. label val expenditure expenditure 

现在的问题是您需要重新定义值标签并应用 recode

. recode expenditure 5=1 1=2 2=3 3=4 4=6 6=5 5=6 
(6 changes made to expenditure)


. tab expenditure [fw=freq]

                    expenditure |      Freq.     Percent        Cum.
--------------------------------+-----------------------------------
                  Afs 2500-5000 |          5        0.80        0.80
                  Afs 5000-7500 |         24        3.84        4.64
                 Afs 7500-10000 |         89       14.24       18.88
I don't know / refuse to answer |        235       37.60       56.48
             Less than Afs 2500 |        263       42.08       98.56
            More than Afs 10000 |          9        1.44      100.00
--------------------------------+-----------------------------------
                          Total |        625      100.00

. label def expenditure 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer", modify


. tab expenditure [fw=freq]

                    expenditure |      Freq.     Percent        Cum.
--------------------------------+-----------------------------------
             Less than Afs 2500 |          5        0.80        0.80
                  Afs 2500-5000 |         24        3.84        4.64
                  Afs 5000-7500 |         89       14.24       18.88
                 Afs 7500-10000 |        235       37.60       56.48
            More than Afs 10000 |        263       42.08       98.56
I don't know / refuse to answer |          9        1.44      100.00
--------------------------------+-----------------------------------
                          Total |        625      100.00

我想通了。

label define order2 1 "Less than Afs 2500" 2 "Afs 2500-5000" 3 "Afs 5000-7500" 4 "Afs 7500-10000" 5 "More than Afs 10000" 6 "I don't know / refuse to answer"

encode expenditure, gen(expenditure2) label(order2)

在不改变数据的情况下完成了任务。