Stata 面板数据：从给定日期开始的一段时间内的 egen =total()

Question

我的数据结构如下（dataex 输出在最后，但很混乱，因为它只显示数字时间表达式）：

id yearmo birthmo smoke surveytime health
1 2002m1  2003m11  0.8  0  .
1 2002m2  2003m11  0.7  0  .
[...]
1 2004m1  2003m11  0.5  1  "good"

我将包含年度调查信息（例如关于我的因变量 health）的面板数据集与有关吸烟暴露的月度信息（数字）合并。我的时间变量 yearmo 包含年份和月份，格式为 %tm。 Birthmo为个人的出生年月，格式相同。

我想生成一个包含怀孕期间接触烟雾总量的变量，该变量处于 birthmo[_n-1], birthmo[_n-10] 期间。

可以用egen prebirth_smoke = total(smoke)引用这个时间段吗？到目前为止我找不到任何东西。但是由于可以计算时间差，例如 gen age2000 = (14610-birthday)/365.25 引用指示生日的变量，我认为我的问题也必须有解决方案...

我的另一种方法是填写每个月的调查信息并使用像 by persnr: egen prebirth_smoke=total(smoke, smoke[_n-10]) if birthmo = moyear 这样的命令。然后我将不得不再次将此信息复制到一年中的每个月，并将数据折叠为年度信息。有没有更简单的方法？

* Example generated by -dataex-. To install: ssc install dataex
clear
input double persnr float(moyear birthmo surveytime) double smoke
23908 504   . 0  23.96554252199413
23908 505   . 0 16.531705948372615
23908 506   . 0 19.731182795698928
23908 507   . 0 15.172916666666667
23908 508   . 0 12.199596774193546
23908 509   . 0 12.218055555555557
23908 510   . 0 10.207416911045943
23908 511   . 0  11.54166666666667
23908 512   . 0 14.311111111111112
23908 513   . 0 16.728005865102638
23908 514   . 0 22.759722222222226
23908 515   . 0  21.10752688172043
23908 516 515 1 27.638440860215056
23908 517 515 1 24.914434523809522
23908 518 515 1 22.103515874027796
23908 519 515 1 16.881249999999998
23908 520 515 1  14.51930596285435
23908 521 515 1 10.573909068193176
23908 522 515 1 10.057123655913978
23908 523 515 1   12.2486559139785
end
format %tm moyear
format %tm birthmo

Answer 1

这适用于您的示例数据：

egen BIRTHMO = mean(birthmo), by(persnr) 
egen exposure = total(inrange(BIRTHMO - moyear, 1, 10) * smoke), by(persnr)

相关技术概览见https://www.stata-journal.com/sjpdf.html?articlenum=dm0055

小分：

dataex 代码没有任何混淆之处。关键是一旦你运行它的含义就很清楚了。
您的 egen 代码无法运行，即使在精神上也是如此。 total() 参数是非法的。打字错误：第二个 = 应该是 ==。警告：help for egen 明确表示不使用下标表达式。
egen, sum() 在 Stata 9 中没有记录。最好使用并参考 total()。代码是等效的，但仍然成立。

Stata 面板数据：从给定日期开始的一段时间内的 egen =total()

Stata panel data: egen =total() over period starting with a given date

sum

date

panel

stata