将长面板数据重塑为宽面板数据,其中数据在 ID 中不唯一
Reshape long panel data to wide where data is not unique within ID
我有一个如下所示的数据集:
| State | Year | Industry | Employment |
|-------|------|----------|------------|
| AL | 2014 | 1 | 123345 |
| AL | 2015 | 1 | 145411 |
| AL | 2016 | 1 | 149402 |
| AL | 2014 | 2 | 153518 |
| AL | 2015 | 2 | 157773 |
| AL | 2016 | 2 | 163156 |
| AK | 2014 | 1 | 167187 |
| AK | 2015 | 1 | 167863 |
| AK | 2016 | 1 | 163320 |
| AK | 2014 | 2 | 162419 |
| AK | 2015 | 2 | 166116 |
| AK | 2016 | 2 | 170136 |
我想得到一个如下所示的数据集:
| State | Year | Employment_Industry1 | Employment_Industry2 |
|-------|------|----------------------|----------------------|
| AL | 2014 | 123345 | 153518 |
| AL | 2015 | 145411 | 157773 |
| AL | 2016 | 149402 | 163156 |
| AK | 2014 | 167187 | 162419 |
| AK | 2015 | 167863 | 166116 |
| AK | 2016 | 163320 | 170136 |
如您所见,我的数据是长格式的,但年份在 State
中重复了 Industry
。当我 reshape wide
时,这会导致问题。
我为几个不同的变量分组生成了 ID,但最终出现错误,结果如下:
values of variable Industry not unique within ID
我需要创建什么样的 ID,或者我可以做些什么来创建所需的数据集?
以下对我有用:
clear
input str2 State Year Industry Employment
AL 2014 1 123345
AL 2015 1 145411
AL 2016 1 149402
AL 2014 2 153518
AL 2015 2 157773
AL 2016 2 163156
AK 2014 1 167187
AK 2015 1 167863
AK 2016 1 163320
AK 2014 2 162419
AK 2015 2 166116
AK 2016 2 170136
end
egen id = group(State)
reshape wide Employment, i(id Year) j(Industry)
drop id
order State Year Employment*
list, abbreviate(15) sepby(State)
+------------------------------------------+
| State Year Employment1 Employment2 |
|------------------------------------------|
1. | AK 2014 167187 162419 |
2. | AK 2015 167863 166116 |
3. | AK 2016 163320 170136 |
|------------------------------------------|
4. | AL 2014 123345 153518 |
5. | AL 2015 145411 157773 |
6. | AL 2016 149402 163156 |
+------------------------------------------+
我有一个如下所示的数据集:
| State | Year | Industry | Employment |
|-------|------|----------|------------|
| AL | 2014 | 1 | 123345 |
| AL | 2015 | 1 | 145411 |
| AL | 2016 | 1 | 149402 |
| AL | 2014 | 2 | 153518 |
| AL | 2015 | 2 | 157773 |
| AL | 2016 | 2 | 163156 |
| AK | 2014 | 1 | 167187 |
| AK | 2015 | 1 | 167863 |
| AK | 2016 | 1 | 163320 |
| AK | 2014 | 2 | 162419 |
| AK | 2015 | 2 | 166116 |
| AK | 2016 | 2 | 170136 |
我想得到一个如下所示的数据集:
| State | Year | Employment_Industry1 | Employment_Industry2 |
|-------|------|----------------------|----------------------|
| AL | 2014 | 123345 | 153518 |
| AL | 2015 | 145411 | 157773 |
| AL | 2016 | 149402 | 163156 |
| AK | 2014 | 167187 | 162419 |
| AK | 2015 | 167863 | 166116 |
| AK | 2016 | 163320 | 170136 |
如您所见,我的数据是长格式的,但年份在 State
中重复了 Industry
。当我 reshape wide
时,这会导致问题。
我为几个不同的变量分组生成了 ID,但最终出现错误,结果如下:
values of variable Industry not unique within ID
我需要创建什么样的 ID,或者我可以做些什么来创建所需的数据集?
以下对我有用:
clear
input str2 State Year Industry Employment
AL 2014 1 123345
AL 2015 1 145411
AL 2016 1 149402
AL 2014 2 153518
AL 2015 2 157773
AL 2016 2 163156
AK 2014 1 167187
AK 2015 1 167863
AK 2016 1 163320
AK 2014 2 162419
AK 2015 2 166116
AK 2016 2 170136
end
egen id = group(State)
reshape wide Employment, i(id Year) j(Industry)
drop id
order State Year Employment*
list, abbreviate(15) sepby(State)
+------------------------------------------+
| State Year Employment1 Employment2 |
|------------------------------------------|
1. | AK 2014 167187 162419 |
2. | AK 2015 167863 166116 |
3. | AK 2016 163320 170136 |
|------------------------------------------|
4. | AL 2014 123345 153518 |
5. | AL 2015 145411 157773 |
6. | AL 2016 149402 163156 |
+------------------------------------------+