如何使用重塑或其他功能来修复此数据

Question

我得到的数据在某些特定条目上合并了各县。每个县应该是此列表中的单独条目，对于这些条目，它们都放在同一行上。我已经使用 split 将它们分成 county1 等，但我试图弄清楚如何使用 reshape 或其他函数来保留所有数据，但更改 county、county1 等转化为对 county 的单独观察。我想在不丢失其他条目的情况下从宽移动到长。

utility_name                    state   county                          unique_id
Alaska Village Elec Coop, Inc   AK      Borough, Kodiak Island          221AK
Wolverine Pwr Supply Coop, Inc  MI      Allegan, Antim, Barry, Benzie,  20910MI
Wolverine Pwr Supply Coop, Inc  MI      Clinton, Eaton, Emmet, Gratiot  20910MI
Wolverine Pwr Supply Coop, Inc  MI      Grand Traverse, Ingham, Ionia   20910MI
Wolverine Pwr Supply Coop, Inc  MI      Isabella, Lake, Leelanau        20910MI
Wolverine Pwr Supply Coop, Inc  MI      Manistee, Mason, Mecosta,       20910MI
Wolverine Pwr Supply Coop, Inc  MI      Missaukee, Montcalm, Muskegon   20910MI
Wolverine Pwr Supply Coop, Inc  MI      Newaygo, Oceana, Osceola            20910MI
Wolverine Pwr Supply Coop, Inc  MI      Ottawa, Alpena, Charlevoix      20910MI
Soyland Power Coop Inc          IL      McDonough, McCoupin             40307IL
Soyland Power Coop Inc          IL      Menard, Morgan,Montgomery       40307IL
Soyland Power Coop Inc          IL      Sangamon,Schuyler,Scott,Pike        40307IL

这将变成

Alaska Village Elec Coop, Inc    AK Borough    221AK
Alaska Village Elec Coop, Inc    AK Kodiak Island    221AK

等等

Answer 1

这个问题似乎预设了美国各县的一些知识：在国际论坛上再多解释一点也不会出错。

reshape 是 Stata 中的命令，而不是函数。但它不需要以任何一种方式适用于此。你似乎想要这样的东西。

由于县名在一个复合变量中以逗号分隔，因此县的数量是逗号的数量加 1。我们通过理论上删除逗号并找出字符串长度的减少来计算逗号。如果我们看到整个数据集，就可以避免其中的一些代码，这可能是不切实际的。

gen long id = _n 
gen ncounties = length(county) - length(subinstr(county, ",", "", .)) + 1 
expand ncounties 
bysort id : gen id2 = _n 
su ncounties, meanonly 
gen County2 = "" 
forval j = 1/`r(max)' { 
     replace County2 = county`j' if id2 == `j' 
}

简而言之，主要设备是使用expand复制每个观察，但复制的数量取决于每个观察中包含多少个县。

Answer 2

如果我没理解错的话，你想要这样的东西：

clear
set more off

input ///
str40 utility_name str2 state   str40 county                      str15 unique_id
"Alaska Village Elec Coop, Inc"   "AK"      "Borough, Kodiak Island"          "221AK"
"Wolverine Pwr Supply Coop, Inc"  "MI"      "Allegan, Antim, Barry, Benzie,"  "20910MI"
end

split county, parse(",")
rename county origcounty

gen i = _n
reshape long county, i(i)
drop if missing(county)

order origcounty, last
list i - county

如何使用重塑或其他功能来修复此数据

How to use reshape or other functions to fix this data

split

reshape

stata