如何用另一个数据集中的值(卡路里)替换变量(食品成分)的代码?
How can I replace codes of a variable (food Ingredient) with value (calorie) from another dataset?
我有一个 Stata 数据集,其中包含六个变量,其中包含消耗的食物成分代码及其以克为单位的重量。我有另一个不同的数据集,其中包含食品成分代码和每 100 克的连续卡路里。我需要用卡路里替换代码来计算总卡路里消耗量。
我该怎么做? (通过替换或生成新变量)
我的第一个(主)数据集是
clear
input double hhid int(Ingredient_1_code Ingredient_1_weight Ingredient_2_code incredient_2_weight Ingredient_3_code ingredient_3_weight Ingredient_4_code Ingredient_4_weight Ingredient_5_code Ingredient_5_weight Ingredient_6_code Ingredient_6_weight)
1 269 8 266 46 . . . . . . . .
1 315 19 . . . . . . . . . .
1 316 9 . . . . . . . . . .
1 2522 3 . . . . . . . . . .
1 1 1570 . . . . . . . . . .
1 1 530 . . . . . . . . . .
1 61 262 64 23 57 17 31 8 2522 5 . .
1 130 78 64 23 57 17 2521 2 31 15 248 1
1 228 578 64 138 57 37 248 3 2521 14 31 35
2 142 328 . . . . . . . . . .
2 272 78 . . . . . . . . . .
2 1 602 . . . . . . . . . .
2 51 344 61 212 246 2 64 50 65 11 2522 10
2 176 402 44 348 61 163 57 17 248 2 64 71
3.1 1 1219 . . . . . . . . . .
3.1 1 410 . . . . . . . . . .
3.1 54 130 52 60 61 32 51 23 21 17 57 4
3.1 44 78 130 44 57 3 248 4 31 49 2522 6
3.1 231 116 904 119 61 220 57 22 248 3 254 6
3.2 156 396 . . . . . . . . . .
3.2 272 78 . . . . . . . . . .
end
我的第二个包含食品成分代码和每 100 克卡路里的数据集是
clear
input str39 Ingredient int(Ingredient_codes Calorie_per_100gm)
"Parboiled rice (coarse)" 1 344
"Non-parboiled rice (coarse)" 2 344
"Fine rice" 3 344
"Rice flour" 4 366
"Suji (cream of wheat/barley)" 5 364
"Wheat" 6 347
"Atta" 7 334
"Maida (wheat flour w/o bran)" 8 346
"Semai/noodles" 9 347
"Chaatu" 10 324
"Chira (flattened rice)" 11 356
"Muri/Khoi (puffed rice)" 12 361
"Barley" 13 324
"Sagu" 14 346
"Corn" 15 355
"Cerelac" 16 418
"Lentil" 21 317
"Chick pea" 22 327
"Anchor daal" 23 375
"Black gram" 24 317
"Khesari" 25 352
"Mung" 26 161
end
我想根据成分获取主数据集中每 100 克的卡路里。
我同意 Nick 提出的关于最好先将此数据变长的评论。阅读为什么这是一个更好的做法:https://worldbank.github.io/dime-data-handbook/processing.html#making-data-tidy
但是,如果您出于某种原因必须以这种方式保存数据,则可以使用当前不整洁的宽格式来完成。下面的代码显示了如何做到这一点。
clear
input str39 Ingredient int(Ingredient_codes Calorie_per_100gm)
"Parboiled rice (coarse)" 1 344
"Non-parboiled rice (coarse)" 2 344
"Fine rice" 3 344
"Rice flour" 4 366
"Suji (cream of wheat/barley)" 5 364
"Wheat" 6 347
"Atta" 7 334
"Maida (wheat flour w/o bran)" 8 346
"Semai/noodles" 9 347
"Chaatu" 10 324
"Chira (flattened rice)" 11 356
"Muri/Khoi (puffed rice)" 12 361
"Barley" 13 324
"Sagu" 14 346
"Corn" 15 355
"Cerelac" 16 418
"Lentil" 21 317
"Chick pea" 22 327
"Anchor daal" 23 375
"Black gram" 24 317
"Khesari" 25 352
"Mung" 26 161
end
drop Ingredient
tempfile code_calories
save `code_calories'
clear
input double hhid int(Ingredient_1_code Ingredient_1_weight Ingredient_2_code incredient_2_weight Ingredient_3_code ingredient_3_weight Ingredient_4_code Ingredient_4_weight Ingredient_5_code Ingredient_5_weight Ingredient_6_code Ingredient_6_weight)
1 269 8 266 46 . . . . . . . .
1 315 19 . . . . . . . . . .
1 316 9 . . . . . . . . . .
1 2522 3 . . . . . . . . . .
1 1 1570 . . . . . . . . . .
1 1 530 . . . . . . . . . .
1 61 262 64 23 57 17 31 8 2522 5 . .
1 130 78 64 23 57 17 2521 2 31 15 248 1
1 228 578 64 138 57 37 248 3 2521 14 31 35
2 142 328 . . . . . . . . . .
2 272 78 . . . . . . . . . .
2 1 602 . . . . . . . . . .
2 51 344 61 212 246 2 64 50 65 11 2522 10
2 176 402 44 348 61 163 57 17 248 2 64 71
3.1 1 1219 . . . . . . . . . .
3.1 1 410 . . . . . . . . . .
3.1 54 130 52 60 61 32 51 23 21 17 57 4
3.1 44 78 130 44 57 3 248 4 31 49 2522 6
3.1 231 116 904 119 61 220 57 22 248 3 254 6
3.2 156 396 . . . . . . . . . .
3.2 272 78 . . . . . . . . . .
end
*Standardize varname
rename incredient_2_weight Ingredient_2_weight
rename ingredient_3_weight Ingredient_3_weight
*Loop over all variables
forvalues var_num = 1/6 {
*Rename to match name in code_calories dataset
rename Ingredient_`var_num'_code Ingredient_codes
*Merge calories for this ingridient
merge m:1 Ingredient_codes using `code_calories', keep(master matched) nogen
*Calculate number of calories for this ingredient
gen Calories_`var_num' = Calorie_per_100gm * Ingredient_`var_num'_weight
*Order new variables and restore names of variables
order Calorie_per_100gm Calories_`var_num', after(Ingredient_`var_num'_weight)
rename Ingredient_codes Ingredient_`var_num'_code
rename Calorie_per_100gm Calorie_per_100gm_`var_num'
}
*Summarize calories across all ingredients
egen total_calories = rowtotal(Calories_?)
我有一个 Stata 数据集,其中包含六个变量,其中包含消耗的食物成分代码及其以克为单位的重量。我有另一个不同的数据集,其中包含食品成分代码和每 100 克的连续卡路里。我需要用卡路里替换代码来计算总卡路里消耗量。
我该怎么做? (通过替换或生成新变量)
我的第一个(主)数据集是
clear
input double hhid int(Ingredient_1_code Ingredient_1_weight Ingredient_2_code incredient_2_weight Ingredient_3_code ingredient_3_weight Ingredient_4_code Ingredient_4_weight Ingredient_5_code Ingredient_5_weight Ingredient_6_code Ingredient_6_weight)
1 269 8 266 46 . . . . . . . .
1 315 19 . . . . . . . . . .
1 316 9 . . . . . . . . . .
1 2522 3 . . . . . . . . . .
1 1 1570 . . . . . . . . . .
1 1 530 . . . . . . . . . .
1 61 262 64 23 57 17 31 8 2522 5 . .
1 130 78 64 23 57 17 2521 2 31 15 248 1
1 228 578 64 138 57 37 248 3 2521 14 31 35
2 142 328 . . . . . . . . . .
2 272 78 . . . . . . . . . .
2 1 602 . . . . . . . . . .
2 51 344 61 212 246 2 64 50 65 11 2522 10
2 176 402 44 348 61 163 57 17 248 2 64 71
3.1 1 1219 . . . . . . . . . .
3.1 1 410 . . . . . . . . . .
3.1 54 130 52 60 61 32 51 23 21 17 57 4
3.1 44 78 130 44 57 3 248 4 31 49 2522 6
3.1 231 116 904 119 61 220 57 22 248 3 254 6
3.2 156 396 . . . . . . . . . .
3.2 272 78 . . . . . . . . . .
end
我的第二个包含食品成分代码和每 100 克卡路里的数据集是
clear
input str39 Ingredient int(Ingredient_codes Calorie_per_100gm)
"Parboiled rice (coarse)" 1 344
"Non-parboiled rice (coarse)" 2 344
"Fine rice" 3 344
"Rice flour" 4 366
"Suji (cream of wheat/barley)" 5 364
"Wheat" 6 347
"Atta" 7 334
"Maida (wheat flour w/o bran)" 8 346
"Semai/noodles" 9 347
"Chaatu" 10 324
"Chira (flattened rice)" 11 356
"Muri/Khoi (puffed rice)" 12 361
"Barley" 13 324
"Sagu" 14 346
"Corn" 15 355
"Cerelac" 16 418
"Lentil" 21 317
"Chick pea" 22 327
"Anchor daal" 23 375
"Black gram" 24 317
"Khesari" 25 352
"Mung" 26 161
end
我想根据成分获取主数据集中每 100 克的卡路里。
我同意 Nick 提出的关于最好先将此数据变长的评论。阅读为什么这是一个更好的做法:https://worldbank.github.io/dime-data-handbook/processing.html#making-data-tidy
但是,如果您出于某种原因必须以这种方式保存数据,则可以使用当前不整洁的宽格式来完成。下面的代码显示了如何做到这一点。
clear
input str39 Ingredient int(Ingredient_codes Calorie_per_100gm)
"Parboiled rice (coarse)" 1 344
"Non-parboiled rice (coarse)" 2 344
"Fine rice" 3 344
"Rice flour" 4 366
"Suji (cream of wheat/barley)" 5 364
"Wheat" 6 347
"Atta" 7 334
"Maida (wheat flour w/o bran)" 8 346
"Semai/noodles" 9 347
"Chaatu" 10 324
"Chira (flattened rice)" 11 356
"Muri/Khoi (puffed rice)" 12 361
"Barley" 13 324
"Sagu" 14 346
"Corn" 15 355
"Cerelac" 16 418
"Lentil" 21 317
"Chick pea" 22 327
"Anchor daal" 23 375
"Black gram" 24 317
"Khesari" 25 352
"Mung" 26 161
end
drop Ingredient
tempfile code_calories
save `code_calories'
clear
input double hhid int(Ingredient_1_code Ingredient_1_weight Ingredient_2_code incredient_2_weight Ingredient_3_code ingredient_3_weight Ingredient_4_code Ingredient_4_weight Ingredient_5_code Ingredient_5_weight Ingredient_6_code Ingredient_6_weight)
1 269 8 266 46 . . . . . . . .
1 315 19 . . . . . . . . . .
1 316 9 . . . . . . . . . .
1 2522 3 . . . . . . . . . .
1 1 1570 . . . . . . . . . .
1 1 530 . . . . . . . . . .
1 61 262 64 23 57 17 31 8 2522 5 . .
1 130 78 64 23 57 17 2521 2 31 15 248 1
1 228 578 64 138 57 37 248 3 2521 14 31 35
2 142 328 . . . . . . . . . .
2 272 78 . . . . . . . . . .
2 1 602 . . . . . . . . . .
2 51 344 61 212 246 2 64 50 65 11 2522 10
2 176 402 44 348 61 163 57 17 248 2 64 71
3.1 1 1219 . . . . . . . . . .
3.1 1 410 . . . . . . . . . .
3.1 54 130 52 60 61 32 51 23 21 17 57 4
3.1 44 78 130 44 57 3 248 4 31 49 2522 6
3.1 231 116 904 119 61 220 57 22 248 3 254 6
3.2 156 396 . . . . . . . . . .
3.2 272 78 . . . . . . . . . .
end
*Standardize varname
rename incredient_2_weight Ingredient_2_weight
rename ingredient_3_weight Ingredient_3_weight
*Loop over all variables
forvalues var_num = 1/6 {
*Rename to match name in code_calories dataset
rename Ingredient_`var_num'_code Ingredient_codes
*Merge calories for this ingridient
merge m:1 Ingredient_codes using `code_calories', keep(master matched) nogen
*Calculate number of calories for this ingredient
gen Calories_`var_num' = Calorie_per_100gm * Ingredient_`var_num'_weight
*Order new variables and restore names of variables
order Calorie_per_100gm Calories_`var_num', after(Ingredient_`var_num'_weight)
rename Ingredient_codes Ingredient_`var_num'_code
rename Calorie_per_100gm Calorie_per_100gm_`var_num'
}
*Summarize calories across all ingredients
egen total_calories = rowtotal(Calories_?)