如何使用重塑或其他功能来修复此数据
How to use reshape or other functions to fix this data
我得到的数据在某些特定条目上合并了各县。每个县应该是此列表中的单独条目,对于这些条目,它们都放在同一行上。我已经使用 split
将它们分成 county1
等,但我试图弄清楚如何使用 reshape
或其他函数来保留所有数据,但更改 county
、county1
等转化为对 county
的单独观察。我想在不丢失其他条目的情况下从宽移动到长。
utility_name state county unique_id
Alaska Village Elec Coop, Inc AK Borough, Kodiak Island 221AK
Wolverine Pwr Supply Coop, Inc MI Allegan, Antim, Barry, Benzie, 20910MI
Wolverine Pwr Supply Coop, Inc MI Clinton, Eaton, Emmet, Gratiot 20910MI
Wolverine Pwr Supply Coop, Inc MI Grand Traverse, Ingham, Ionia 20910MI
Wolverine Pwr Supply Coop, Inc MI Isabella, Lake, Leelanau 20910MI
Wolverine Pwr Supply Coop, Inc MI Manistee, Mason, Mecosta, 20910MI
Wolverine Pwr Supply Coop, Inc MI Missaukee, Montcalm, Muskegon 20910MI
Wolverine Pwr Supply Coop, Inc MI Newaygo, Oceana, Osceola 20910MI
Wolverine Pwr Supply Coop, Inc MI Ottawa, Alpena, Charlevoix 20910MI
Soyland Power Coop Inc IL McDonough, McCoupin 40307IL
Soyland Power Coop Inc IL Menard, Morgan,Montgomery 40307IL
Soyland Power Coop Inc IL Sangamon,Schuyler,Scott,Pike 40307IL
这将变成
Alaska Village Elec Coop, Inc AK Borough 221AK
Alaska Village Elec Coop, Inc AK Kodiak Island 221AK
等等
这个问题似乎预设了美国各县的一些知识:在国际论坛上再多解释一点也不会出错。
reshape
是 Stata 中的命令,而不是函数。但它不需要以任何一种方式适用于此。你似乎想要这样的东西。
由于县名在一个复合变量中以逗号分隔,因此县的数量是逗号的数量加 1。我们通过理论上删除逗号并找出字符串长度的减少来计算逗号。如果我们看到整个数据集,就可以避免其中的一些代码,这可能是不切实际的。
gen long id = _n
gen ncounties = length(county) - length(subinstr(county, ",", "", .)) + 1
expand ncounties
bysort id : gen id2 = _n
su ncounties, meanonly
gen County2 = ""
forval j = 1/`r(max)' {
replace County2 = county`j' if id2 == `j'
}
简而言之,主要设备是使用expand
复制每个观察,但复制的数量取决于每个观察中包含多少个县。
如果我没理解错的话,你想要这样的东西:
clear
set more off
input ///
str40 utility_name str2 state str40 county str15 unique_id
"Alaska Village Elec Coop, Inc" "AK" "Borough, Kodiak Island" "221AK"
"Wolverine Pwr Supply Coop, Inc" "MI" "Allegan, Antim, Barry, Benzie," "20910MI"
end
split county, parse(",")
rename county origcounty
gen i = _n
reshape long county, i(i)
drop if missing(county)
order origcounty, last
list i - county
我得到的数据在某些特定条目上合并了各县。每个县应该是此列表中的单独条目,对于这些条目,它们都放在同一行上。我已经使用 split
将它们分成 county1
等,但我试图弄清楚如何使用 reshape
或其他函数来保留所有数据,但更改 county
、county1
等转化为对 county
的单独观察。我想在不丢失其他条目的情况下从宽移动到长。
utility_name state county unique_id
Alaska Village Elec Coop, Inc AK Borough, Kodiak Island 221AK
Wolverine Pwr Supply Coop, Inc MI Allegan, Antim, Barry, Benzie, 20910MI
Wolverine Pwr Supply Coop, Inc MI Clinton, Eaton, Emmet, Gratiot 20910MI
Wolverine Pwr Supply Coop, Inc MI Grand Traverse, Ingham, Ionia 20910MI
Wolverine Pwr Supply Coop, Inc MI Isabella, Lake, Leelanau 20910MI
Wolverine Pwr Supply Coop, Inc MI Manistee, Mason, Mecosta, 20910MI
Wolverine Pwr Supply Coop, Inc MI Missaukee, Montcalm, Muskegon 20910MI
Wolverine Pwr Supply Coop, Inc MI Newaygo, Oceana, Osceola 20910MI
Wolverine Pwr Supply Coop, Inc MI Ottawa, Alpena, Charlevoix 20910MI
Soyland Power Coop Inc IL McDonough, McCoupin 40307IL
Soyland Power Coop Inc IL Menard, Morgan,Montgomery 40307IL
Soyland Power Coop Inc IL Sangamon,Schuyler,Scott,Pike 40307IL
这将变成
Alaska Village Elec Coop, Inc AK Borough 221AK
Alaska Village Elec Coop, Inc AK Kodiak Island 221AK
等等
这个问题似乎预设了美国各县的一些知识:在国际论坛上再多解释一点也不会出错。
reshape
是 Stata 中的命令,而不是函数。但它不需要以任何一种方式适用于此。你似乎想要这样的东西。
由于县名在一个复合变量中以逗号分隔,因此县的数量是逗号的数量加 1。我们通过理论上删除逗号并找出字符串长度的减少来计算逗号。如果我们看到整个数据集,就可以避免其中的一些代码,这可能是不切实际的。
gen long id = _n
gen ncounties = length(county) - length(subinstr(county, ",", "", .)) + 1
expand ncounties
bysort id : gen id2 = _n
su ncounties, meanonly
gen County2 = ""
forval j = 1/`r(max)' {
replace County2 = county`j' if id2 == `j'
}
简而言之,主要设备是使用expand
复制每个观察,但复制的数量取决于每个观察中包含多少个县。
如果我没理解错的话,你想要这样的东西:
clear
set more off
input ///
str40 utility_name str2 state str40 county str15 unique_id
"Alaska Village Elec Coop, Inc" "AK" "Borough, Kodiak Island" "221AK"
"Wolverine Pwr Supply Coop, Inc" "MI" "Allegan, Antim, Barry, Benzie," "20910MI"
end
split county, parse(",")
rename county origcounty
gen i = _n
reshape long county, i(i)
drop if missing(county)
order origcounty, last
list i - county