在 Stata 中转置数据变量名称

Transposing Data Variable names in Stata

你好,我想使用美国国家科学基金会数据集,但原始 excel 文件变量名没有正确转置数据。有没有人有关于如何转置数据集以使其适合用于分析的 stata 格式的示例代码。这是原始 excel 文件,因此您可以了解问题。

Raw Excel File from NSF website

*已编辑以添加导入视图

记住 Nick 关于 SO 的本质的恰当观点,我确实有一些示例代码可能对您有所帮助。不知道你所说的“转置”是什么意思,我无法给出确切的答案,但你可以根据你的目的调整下面的 reshape 命令。

导入并查看

// Import Excel File defining TL and BR of table
    import excel "nsb20197-tabs02-012.xlsx", cellrange(A6:K177) clear
    list in 1/15

     +----------------------------------------------------------------------------------------------------------------------------------------+
     |                                A       B       C       D      E                   F       G       H       I                   J      K |
     |----------------------------------------------------------------------------------------------------------------------------------------|
  1. |                Associate's level                                                                                                       |
  2. | American Indian or Alaska Native                                                                                                       |
  3. |                       All fields    6282    4131    2151   65.8                34.2    8935    5697    3238                63.8   36.2 |
  4. |                              S&E     608     386     222   63.5                36.5     942     495     447                52.5   47.5 |
  5. |                      Engineering      17       4      13   23.5                76.5      50       8      42                  16     84 |
     |----------------------------------------------------------------------------------------------------------------------------------------|
  6. |                 Natural sciences     384     220     164   57.3                42.7     453     174     279                38.4   61.6 |
  7. |   Social and behavioral sciences     207     162      45   78.3                21.7     439     313     126                71.3   28.7 |
  8. |                          Non-S&E    5674    3745    1929     66                  34    7993    5202    2791   65.09999999999999   34.9 |
  9. |        Asian or Pacific Islander                                                                                                       |
 10. |                       All fields   27313   15522   11791   56.8                43.2   54809   30916   23893                56.4   43.6 |
     |----------------------------------------------------------------------------------------------------------------------------------------|
 11. |                              S&E    2649    1284    1365   48.5                51.5    7862    3492    4370                44.4   55.6 |
 12. |                      Engineering     160      23     137   14.4   85.59999999999999     574     111     463                19.3   80.7 |
 13. |                 Natural sciences    2010     939    1071   46.7                53.3    4419    1562    2857                35.3   64.7 |
 14. |   Social and behavioral sciences     479     322     157   67.2                32.8    2869    1819    1050                63.4   36.6 |
 15. |                          Non-S&E   24664   14238   10426   57.7                42.3   46947   27424   19523                58.4   41.6 |



构建数据

// Create supercategories
    
    * level = Column A if column A contains the word level (ignoring case)
    gen level = word(A,1) if ustrregexm(A, "level", 1), before(A)
    
    * demographic = Column A if next ob in column A contains the word all fields (ignoring case)
    gen demographic = A if ustrregexm(A[_n+1], "all fields", 1), before(A)
    
    * fill down demographic and level, and drop blank rows
    foreach v of varlist level demographic {
        replace `v' = `v'[_n-1] if missing(`v')
    }
    drop if mi(demographic) | demographic == A | regexm(A, level)
    
// rename variables 

    * rename A
    rename A field
    
    * rename count columns
    local list "B C D E F" 
    local year = 2000
    rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )
    
    local list "G H I J K" 
    local year = 2017
    rename (`list') (all_`year' female_`year' male_`year' perc_female_`year' perc_male_`year' )
    
    * destring 
    destring *_2000 *_2017, replace

整形为长

    * reshape long
    drop perc*
    reshape long all_ male_ female_, i(level demographic field) j(year)
    
    rename *_ degrees*
    reshape long degrees,  i(level demographic field year) j(gender) string

一个示例,说明如何按字段重塑宽度

    * test reshape wide by field + ensure the variable name is less than 32 characters post reshape 
    replace field = lower(strtoname(field))
    replace field = substr(field, 1, 32 - strlen("degrees") - 1) 
    reshape wide degrees, i(level demographic year gender) j(field) string