如何获取数据集的非连续进入和退出日期？

Question

我有一个 2008 年到 2018 年劳动力中个人的不平衡面板数据集。因此，如果个人在特定年份工作，他们将获得收入。它看起来像这样：

* Example generated by -dataex-. For more info, type help dataex
clear
input float(idno year income)
1 2008 100
1 2009 100
1 2010 100
1 2011 100
1 2012 100
1 2013 100
1 2014 100
1 2015 100
1 2016 100
1 2017 100
1 2018 100
2 2008 100
2 2009 100
2 2010 100
3 2009 100
3 2010 100
3 2015 100
3 2016 100
end

从这个样本中，我们知道个人 1 (idno== 1) 从 2008 年到 2018 年有收入；同样，个人 2 (idno== 2) 在 2008-2010 期间工作。

我想确定个人进入劳动力市场和离开劳动力市场的年份。因此，我尝试了以下方法：

我将数据集矩形化（我使用 Stata 16）：

fillin idno year

然后我确定个人是否在数据集中工作过：

gen work = . 
replace work = 1 if income != .

然后我尝试按个人确定开始和结束日期（这只适用于连续的工作时段）

bysort idno: gen years_earn_income = year if work == 1 
bysort idno: gen years_no_income = year if work == 0
bysort idno: gen start = min(years_earn_income)
bysort idno: gen end = max(years_earn_income)

我正在努力为有多个就业期的个人找到合适的入职和离职年份。例如，个人 3 (idno== 3) 工作时间为 2009-2010 和 2015-2016。因此，我希望变量能够反映多个就业期，就像个人 3 的情况一样。如果有任何想法，我将不胜感激。

Answer 1

有关处理法术的讨论，请参阅 https://www.stata-journal.com/article.html?article=dm0029，有关实现，请参阅 SSC 的 tsspell。你的例子可以这样分析：

 clear 
 input idno year income 
 1 2008 100
 1 2009 100
 1 2010 100
 1 2011 100
 1 2012 100
 1 2013 100
 1 2014 100
 1 2015 100
 1 2016 100
 1 2017 100
 1 2018 100
 2 2008 100
 2 2009 100
 2 2010 100
 3 2009 100
 3 2010 100
 3 2015 100
 3 2016 100
 end 
 
 tsset idno year 
 tsfill 
 
 ssc install tsspell 
 
 tsspell, pcond(income)
 
 list, sepby(idno _spell)
 
 list if _seq == 1 | _end, sepby(idno _spell)

结果如下：

.  list, sepby(idno _spell)

     +---------------------------------------------+
     | idno   year   income   _seq   _spell   _end |
     |---------------------------------------------|
  1. |    1   2008      100      1        1      0 |
  2. |    1   2009      100      2        1      0 |
  3. |    1   2010      100      3        1      0 |
  4. |    1   2011      100      4        1      0 |
  5. |    1   2012      100      5        1      0 |
  6. |    1   2013      100      6        1      0 |
  7. |    1   2014      100      7        1      0 |
  8. |    1   2015      100      8        1      0 |
  9. |    1   2016      100      9        1      0 |
 10. |    1   2017      100     10        1      0 |
 11. |    1   2018      100     11        1      1 |
     |---------------------------------------------|
 12. |    2   2008      100      1        1      0 |
 13. |    2   2009      100      2        1      0 |
 14. |    2   2010      100      3        1      1 |
     |---------------------------------------------|
 15. |    3   2009      100      1        1      0 |
 16. |    3   2010      100      2        1      1 |
     |---------------------------------------------|
 17. |    3   2011        .      0        0      0 |
 18. |    3   2012        .      0        0      0 |
 19. |    3   2013        .      0        0      0 |
 20. |    3   2014        .      0        0      0 |
     |---------------------------------------------|
 21. |    3   2015      100      1        2      0 |
 22. |    3   2016      100      2        2      1 |
     +---------------------------------------------+


.  list if _seq == 1 | _end, sepby(idno _spell)

     +---------------------------------------------+
     | idno   year   income   _seq   _spell   _end |
     |---------------------------------------------|
  1. |    1   2008      100      1        1      0 |
 11. |    1   2018      100     11        1      1 |
     |---------------------------------------------|
 12. |    2   2008      100      1        1      0 |
 14. |    2   2010      100      3        1      1 |
     |---------------------------------------------|
 15. |    3   2009      100      1        1      0 |
 16. |    3   2010      100      2        1      1 |
     |---------------------------------------------|
 21. |    3   2015      100      1        2      0 |
 22. |    3   2016      100      2        2      1 |
     +---------------------------------------------+

如何获取数据集的非连续进入和退出日期？

How to get non-consecutive entry and exit dates for datasets?

stata

data-cleaning