从具有相同国家列表的多个变量创建新变量
Creating new variable from several variables with same list of countries
在 Stata 中,我拆分了一个变量,其中最多 20 个国家用逗号分隔,现在我有 20 个不同的变量(country1
到 country20
),但同一个国家列在多个变量 country1
到 country20
.
例如,乌干达可能位于 country1
、country2
和 country5
。现在,我想为每个国家/地区创建一个变量(如果为真,则为 1,如果为假,则为 0)。所以,基本上我想为 20 个国家/地区中的每一个国家分配一个变量。我试过了,但没有用。
local N = _N
forvalues i = 1/`N' {
local s1 = Countryies1 [`i']
local s2 = Countryies2 [`i']
local s3 = Countryies3 [`i']
local s4 = Countryies4 [`i']
local s5 = Countryies5 [`i']
local s6 = Countryies6 [`i']
local s7 = Countryies7 [`i']
local s8 = Countryies8 [`i']
local s9 = Countryies9 [`i']
local s10 = Countryies10 [`i']
local s11 = Countryies11 [`i']
local s12 = Countryies12 [`i']
local s13 = Countryies13 [`i']
local s14 = Countryies14 [`i']
local s15 = Countryies15 [`i']
local s16 = Countryies16 [`i']
local s17 = Countryies17 [`i']
local s18 = Countryies18 [`i']
local s19 = Countryies19 [`i']
local s20 = Countryies20 [`i']
local intersection: list s1 & s2 & s3 & s4 & s5 & s6 & s7 & s8 & s9 & s10 & s11 & s12 & s13 & s14 & s15 & s16 & s17 & s18 & s19 & s20
replace country ="`intersection'" in `i'
}
这似乎可行 -- 并且在任何意义上都不排除其他解决方案。
clear
input str42 countries
"Uganda"
"Uganda, Kenya"
"Uganda, Kenya, Tanzania"
"South Africa"
end
gen id = _n
save datasofar, replace
keep id countries
split countries, parse(,)
drop countries
reshape long countries, i(id) j(which)
drop if missing(countries)
replace countries = trim(countries)
gen name = strtoname(countries)
levelsof name, local(names)
gen new_id = _n
foreach n of local names {
gen is_`n' = name == "`n'"
su new_id if is_`n', meanonly
label var is_`n' "`=countries[r(min)]'"
local vars `vars' is_`n'
}
collapse (max) `vars', by(id)
merge 1:1 id using datasofar
+----------------------------------------------------------------------------------------+
| id is_Kenya is_Sou~a is_Tan~a is_Uga~a countries _merge |
|----------------------------------------------------------------------------------------|
1. | 1 0 0 0 1 Uganda Matched (3) |
2. | 2 1 0 0 1 Uganda, Kenya Matched (3) |
3. | 3 1 0 1 1 Uganda, Kenya, Tanzania Matched (3) |
4. | 4 0 1 0 0 South Africa Matched (3) |
+----------------------------------------------------------------------------------------+
另一种解决方案是只遍历名称,所以
foreach c in Uganda Kenya Tanzania "South Africa" {
local C = strtoname("`c'")
gen is_`C' = strpos(countries, "`c'") > 0
}
但要小心——拼写的变化会伤害你。他们也会咬住早期的代码。
在 Stata 中,我拆分了一个变量,其中最多 20 个国家用逗号分隔,现在我有 20 个不同的变量(country1
到 country20
),但同一个国家列在多个变量 country1
到 country20
.
例如,乌干达可能位于 country1
、country2
和 country5
。现在,我想为每个国家/地区创建一个变量(如果为真,则为 1,如果为假,则为 0)。所以,基本上我想为 20 个国家/地区中的每一个国家分配一个变量。我试过了,但没有用。
local N = _N
forvalues i = 1/`N' {
local s1 = Countryies1 [`i']
local s2 = Countryies2 [`i']
local s3 = Countryies3 [`i']
local s4 = Countryies4 [`i']
local s5 = Countryies5 [`i']
local s6 = Countryies6 [`i']
local s7 = Countryies7 [`i']
local s8 = Countryies8 [`i']
local s9 = Countryies9 [`i']
local s10 = Countryies10 [`i']
local s11 = Countryies11 [`i']
local s12 = Countryies12 [`i']
local s13 = Countryies13 [`i']
local s14 = Countryies14 [`i']
local s15 = Countryies15 [`i']
local s16 = Countryies16 [`i']
local s17 = Countryies17 [`i']
local s18 = Countryies18 [`i']
local s19 = Countryies19 [`i']
local s20 = Countryies20 [`i']
local intersection: list s1 & s2 & s3 & s4 & s5 & s6 & s7 & s8 & s9 & s10 & s11 & s12 & s13 & s14 & s15 & s16 & s17 & s18 & s19 & s20
replace country ="`intersection'" in `i'
}
这似乎可行 -- 并且在任何意义上都不排除其他解决方案。
clear
input str42 countries
"Uganda"
"Uganda, Kenya"
"Uganda, Kenya, Tanzania"
"South Africa"
end
gen id = _n
save datasofar, replace
keep id countries
split countries, parse(,)
drop countries
reshape long countries, i(id) j(which)
drop if missing(countries)
replace countries = trim(countries)
gen name = strtoname(countries)
levelsof name, local(names)
gen new_id = _n
foreach n of local names {
gen is_`n' = name == "`n'"
su new_id if is_`n', meanonly
label var is_`n' "`=countries[r(min)]'"
local vars `vars' is_`n'
}
collapse (max) `vars', by(id)
merge 1:1 id using datasofar
+----------------------------------------------------------------------------------------+
| id is_Kenya is_Sou~a is_Tan~a is_Uga~a countries _merge |
|----------------------------------------------------------------------------------------|
1. | 1 0 0 0 1 Uganda Matched (3) |
2. | 2 1 0 0 1 Uganda, Kenya Matched (3) |
3. | 3 1 0 1 1 Uganda, Kenya, Tanzania Matched (3) |
4. | 4 0 1 0 0 South Africa Matched (3) |
+----------------------------------------------------------------------------------------+
另一种解决方案是只遍历名称,所以
foreach c in Uganda Kenya Tanzania "South Africa" {
local C = strtoname("`c'")
gen is_`C' = strpos(countries, "`c'") > 0
}
但要小心——拼写的变化会伤害你。他们也会咬住早期的代码。