如何用Stata计算随时间变化的历史均值

Question

如何使用至少有四个观测值的扩展 window 来计算 X 的平均值？

这是一个数字示例：

Answer 1

Time-varying means in an expanding time window can be phrased otherwise as to imply the mean of all values from the start of records to the current date. You don't give a time variable so I assume data are in order and supply a time variable.

The community-contributed command rangestat (to be installed from SSC using ssc install rangestat) can give the mean of all values to date in this way:

clear 
input X
50.735469
48.278413
42.807671
49.247854
52.20223
49.726689
50.823169
49.099351
48.949562
47.410434
end 

gen t = _n 

rangestat (count) X (mean) X, int(t . 0) 

list 

    +-------------------------------------+
     |        X    t   X_count      X_mean |
     |-------------------------------------|
  1. | 50.73547    1         1    50.73547 |
  2. | 48.27841    2         2   49.506941 |
  3. | 42.80767    3         3   47.273851 |
  4. | 49.24785    4         4   47.767351 |
  5. | 52.20223    5         5   48.654327 |
     |-------------------------------------|
  6. | 49.72669    6         6   48.833054 |
  7. | 50.82317    7         7   49.117356 |
  8. | 49.09935    8         8   49.115105 |
  9. | 48.94956    9         9   49.096711 |
 10. | 47.41043   10        10   48.928084 |
     +-------------------------------------+

Evidently you can ignore results for small counts as you please.

The syntax is naturally explained in the help for rangestat: suffice it to say here that the syntax for the option -- namely interval(t . 0) -- is three-fold:

for the time variable t

and two offsets

backwards as far as possible: system missing . here means arbitrarily large
forwards just 0

In mathematical terms the mean is from time minus infinity, or as much as possible, to time 0, the present.

The count result is the number of observations in the window with non-missing values on X. Here as the time variable is 1 up the count is trivially the same as the time variable, but in real problems the time variable is much more likely to be a date of some kind. Unlike some other commands rangestat doesn't have an option to insist on a minimum number of points with non-missing values in a window, but you can count how many there are and decide to ignore those based on too few data. That is left to the user here.

Incidentally, you could make a good start on this kind of problem by working out a cumulative sum and then dividing by the number of values so far. That needs care with (e.g.) gaps in data, irregularly spaced data or missing values and a virtue of rangestat is that all such difficulties are considered.

如何用Stata计算随时间变化的历史均值

How to calculate time varying historical mean with Stata

mean

stata

rolling-average