修剪我的面板数据集 - 如果前面的 ID 满足补充标准,则过滤掉符合标准的观察结果

Trimming my panel dataset - filtering out observations meeting criterion if preceding ID meets the complementary criterion

我正在处理一个数据集,该数据集包括 Stata 16.0 中 9 个宽变量的 118,979 个观测值。最突出的变量是公司对多个日期的观察报告是 "GPS" 还是 "EPS"。这些公司可以报告一个数据点中的 "GPS" 观察结果,以及下一个数据点中的 "EPS" 观察结果。请参阅下面的数据概述以进一步可视化。

数据样本:

clear
input str8 cusip8 str16 cname str4 measure double actual long anndats_act float(fyear tanalyst meanforcast UE)
"87482X10" "TALMER BANCORP"   "EPS"   1.21 20118 2014  29   .8686207     .3930131
"87482X10" "TALMER BANCORP"   "GPS"   1.02 20479 2015  34   .8576471     .1893004

一旦标识符(在上面 table 中为 cusip8)报告了多个日期的 EPS,我需要删除 GPS 观测值(多个日期)。也就是说,如果一家公司报告了 GPS 以及 EPS,例如2010 年 1 月 1 日,我想放弃 GPS 观测以保留 EPS。 如果一家公司只报告 GPS,而没有在给定日期报告 EPS,我想在我的数据集中保留 GPS 观测值。

以下对我有用(根据需要调整变量名称):

. clear

. input str10(company_id measure) month day year

     company_id measure month day year
  1. "Company A" "EPS" 1 1 2010
  2. "Company A" "GPS" 1 1 2010 
  3. "Company A" "GPS" 1 1 2010
  4. "Company A" "GPS" 1 2 2010
  5. "Company B" "EPS" 1 2 2010
  6. "Company B" "GPS" 1 1 2010
  7. "Company C" "GPS" 1 4 2010
  8. "Company C" "EPS" 1 4 2010
  9. end

. 
. gen date = mdy(month,day,year)

. format date %d

. drop month day year

. 
. sort company_id date measure

. 
. gen both = 0

. by company_id date: replace both = 1 if measure[1] == "EPS" & measure[2] == "GPS"
(5 real changes made)

. 
. list, sepby(company_id)

     +----------------------------------------+
     | company~d   measure        date   both |
     |----------------------------------------|
  1. | Company A       EPS   01jan2010      1 |
  2. | Company A       GPS   01jan2010      1 |
  3. | Company A       GPS   01jan2010      1 |
  4. | Company A       GPS   02jan2010      0 |
     |----------------------------------------|
  5. | Company B       GPS   01jan2010      0 |
  6. | Company B       EPS   02jan2010      0 |
     |----------------------------------------|
  7. | Company C       EPS   04jan2010      1 |
  8. | Company C       GPS   04jan2010      1 |
     +----------------------------------------+

. 
. drop if measure == "GPS" & both == 1
(3 observations deleted)

. 
. list, sepby(company_id)

     +----------------------------------------+
     | company~d   measure        date   both |
     |----------------------------------------|
  1. | Company A       EPS   01jan2010      1 |
  2. | Company A       GPS   02jan2010      0 |
     |----------------------------------------|
  3. | Company B       GPS   01jan2010      0 |
  4. | Company B       EPS   02jan2010      0 |
     |----------------------------------------|
  5. | Company C       EPS   04jan2010      1 |
     +----------------------------------------+