如何用另一个数据集中的值(卡路里)替换变量(食品成分)的代码?

How can I replace codes of a variable (food Ingredient) with value (calorie) from another dataset?

我有一个 Stata 数据集,其中包含六个变量,其中包含消耗的食物成分代码及其以克为单位的重量。我有另一个不同的数据集,其中包含食品成分代码和每 100 克的连续卡路里。我需要用卡路里替换代码来计算总卡路里消耗量。

我该怎么做? (通过替换或生成新变量)

我的第一个(主)数据集是

clear 
input double hhid int(Ingredient_1_code Ingredient_1_weight Ingredient_2_code incredient_2_weight Ingredient_3_code ingredient_3_weight Ingredient_4_code Ingredient_4_weight Ingredient_5_code Ingredient_5_weight Ingredient_6_code Ingredient_6_weight)
   1  269    8 266  46    .   .    .   .    .   .    .  .
   1  315   19   .   .    .   .    .   .    .   .    .  .
   1  316    9   .   .    .   .    .   .    .   .    .  .
   1 2522    3   .   .    .   .    .   .    .   .    .  .
   1    1 1570   .   .    .   .    .   .    .   .    .  .
   1    1  530   .   .    .   .    .   .    .   .    .  .
   1   61  262  64  23   57  17   31   8 2522   5    .  .
   1  130   78  64  23   57  17 2521   2   31  15  248  1
   1  228  578  64 138   57  37  248   3 2521  14   31 35
   2  142  328   .   .    .   .    .   .    .   .    .  .
   2  272   78   .   .    .   .    .   .    .   .    .  .
   2    1  602   .   .    .   .    .   .    .   .    .  .
   2   51  344  61 212  246   2   64  50   65  11 2522 10
   2  176  402  44 348   61 163   57  17  248   2   64 71
 3.1    1 1219   .   .    .   .    .   .    .   .    .  .
 3.1    1  410   .   .    .   .    .   .    .   .    .  .
 3.1   54  130  52  60   61  32   51  23   21  17   57  4
 3.1   44   78 130  44   57   3  248   4   31  49 2522  6
 3.1  231  116 904 119   61 220   57  22  248   3  254  6
 3.2  156  396   .   .    .   .    .   .    .   .    .  .
 3.2  272   78   .   .    .   .    .   .    .   .    .  .
end 

我的第二个包含食品成分代码和每 100 克卡路里的数据集是


clear
input str39 Ingredient int(Ingredient_codes Calorie_per_100gm)
"Parboiled rice (coarse)"        1 344
"Non-parboiled rice (coarse)"    2 344
"Fine rice"                      3 344
"Rice flour"                     4 366
"Suji (cream of wheat/barley)"   5 364
"Wheat"                          6 347
"Atta"                           7 334
"Maida (wheat flour w/o bran)"   8 346
"Semai/noodles"                  9 347
"Chaatu"                        10 324
"Chira (flattened rice)"        11 356
"Muri/Khoi (puffed rice)"       12 361
"Barley"                        13 324
"Sagu"                          14 346
"Corn"                          15 355
"Cerelac"                       16 418
"Lentil"                        21 317
"Chick pea"                     22 327
"Anchor daal"                   23 375
"Black gram"                    24 317
"Khesari"                       25 352
"Mung"                          26 161
end 

我想根据成分获取主数据集中每 100 克的卡路里。

我同意 Nick 提出的关于最好先将此数据变长的评论。阅读为什么这是一个更好的做法:https://worldbank.github.io/dime-data-handbook/processing.html#making-data-tidy

但是,如果您出于某种原因必须以这种方式保存数据,则可以使用当前不整洁的宽格式来完成。下面的代码显示了如何做到这一点。

clear
input str39 Ingredient int(Ingredient_codes Calorie_per_100gm)
"Parboiled rice (coarse)"        1 344
"Non-parboiled rice (coarse)"    2 344
"Fine rice"                      3 344
"Rice flour"                     4 366
"Suji (cream of wheat/barley)"   5 364
"Wheat"                          6 347
"Atta"                           7 334
"Maida (wheat flour w/o bran)"   8 346
"Semai/noodles"                  9 347
"Chaatu"                        10 324
"Chira (flattened rice)"        11 356
"Muri/Khoi (puffed rice)"       12 361
"Barley"                        13 324
"Sagu"                          14 346
"Corn"                          15 355
"Cerelac"                       16 418
"Lentil"                        21 317
"Chick pea"                     22 327
"Anchor daal"                   23 375
"Black gram"                    24 317
"Khesari"                       25 352
"Mung"                          26 161
end 

drop Ingredient

tempfile code_calories
save `code_calories'


clear 
input double hhid int(Ingredient_1_code Ingredient_1_weight Ingredient_2_code incredient_2_weight Ingredient_3_code ingredient_3_weight Ingredient_4_code Ingredient_4_weight Ingredient_5_code Ingredient_5_weight Ingredient_6_code Ingredient_6_weight)
   1  269    8 266  46    .   .    .   .    .   .    .  .
   1  315   19   .   .    .   .    .   .    .   .    .  .
   1  316    9   .   .    .   .    .   .    .   .    .  .
   1 2522    3   .   .    .   .    .   .    .   .    .  .
   1    1 1570   .   .    .   .    .   .    .   .    .  .
   1    1  530   .   .    .   .    .   .    .   .    .  .
   1   61  262  64  23   57  17   31   8 2522   5    .  .
   1  130   78  64  23   57  17 2521   2   31  15  248  1
   1  228  578  64 138   57  37  248   3 2521  14   31 35
   2  142  328   .   .    .   .    .   .    .   .    .  .
   2  272   78   .   .    .   .    .   .    .   .    .  .
   2    1  602   .   .    .   .    .   .    .   .    .  .
   2   51  344  61 212  246   2   64  50   65  11 2522 10
   2  176  402  44 348   61 163   57  17  248   2   64 71
 3.1    1 1219   .   .    .   .    .   .    .   .    .  .
 3.1    1  410   .   .    .   .    .   .    .   .    .  .
 3.1   54  130  52  60   61  32   51  23   21  17   57  4
 3.1   44   78 130  44   57   3  248   4   31  49 2522  6
 3.1  231  116 904 119   61 220   57  22  248   3  254  6
 3.2  156  396   .   .    .   .    .   .    .   .    .  .
 3.2  272   78   .   .    .   .    .   .    .   .    .  .
end 

*Standardize varname
rename incredient_2_weight Ingredient_2_weight
rename ingredient_3_weight Ingredient_3_weight

*Loop over all variables
forvalues var_num = 1/6 {
    
    *Rename to match name in code_calories dataset
    rename Ingredient_`var_num'_code Ingredient_codes
    
    *Merge calories for this ingridient
    merge m:1 Ingredient_codes using `code_calories', keep(master matched) nogen
    
    *Calculate number of calories for this ingredient
    gen   Calories_`var_num' = Calorie_per_100gm * Ingredient_`var_num'_weight
    
    *Order new variables and restore names of variables
    order Calorie_per_100gm Calories_`var_num', after(Ingredient_`var_num'_weight)
    rename Ingredient_codes Ingredient_`var_num'_code 
    rename Calorie_per_100gm Calorie_per_100gm_`var_num'    
}

*Summarize calories across all ingredients
egen total_calories = rowtotal(Calories_?)