Stata 查询：需要帮助创建一个新变量，该变量依赖于同一家庭中不同行的数据

Question

我想在我的横截面调查数据集中创建一个新列，其中包括女性丈夫的教育程度。我有家庭（hid）和个人（HL1）的ID，以及以下信息：

MA1==女性是否已婚（仅女性可观察数据）
MA2==丈夫的年龄（数据仅对已婚女性可见）
HL4== 性别（所有个体均可观察到的数据）
HL6== 年龄（所有个体的可观察数据）
ED4A== 最高教育水平（所有人都可观察到的数据）

本质上，我想创建代码来执行以下操作：

先看老婆是否已婚（MA1）
如果是，再看老公年龄(MA2)
然后将丈夫的年龄 (MA2) 与家庭中男性的年龄 (HL6) 配对
然后查看男性的教育程度 (ED4A) 并将该教育程度放在新的列中，但与女性的行号在同一行。

我试过了，但没用： bysort hid (HL6) : gen husb_educ = ED4A[MA2]

下面是来自数据集的示例：

+-----+----------+-----+-----+--------+-----+----------+
| HL1 |   MA1    | MA2 | hid |  HL4   | HL6 |   ED4A   |
+-----+----------+-----+-----+--------+-----+----------+
|   1 |          |     | 106 | Male   |  57 | Diploma  |
|   2 |          |     | 106 | Female |  53 | Intermed |
|   3 |          |     | 106 | Male   |  30 | Higher S |
|   4 | No, not  |     | 106 | Female |  24 | Bachelor |
|   5 |          |     | 106 | Male   |  22 | Diploma  |
|   6 |          |     | 106 | Male   |  17 | Secondar |
|   7 |          |     | 106 | Female |  10 | Primary  |
|   8 | Yes, cur |  22 | 106 | Female |  23 | Diploma  |
|   9 |          |     | 106 | Female |   0 |          |
+-----+----------+-----+-----+--------+-----+----------+

所以在这个例子中，我想要一个新列，上面写着丈夫的教育，并在第 8 行中，将文凭作为新列中的值（因为女方的丈夫是 22 岁，男方是 22 岁家庭有文凭）。

相同样本，没有值标签：

+-----+-----+-----+-----+-----+-----+------+
| HL1 | MA1 | MA2 | hid | HL4 | HL6 | ED4A |
+-----+-----+-----+-----+-----+-----+------+
|   1 |     |     | 106 |   1 |  57 |    4 |
|   2 |     |     | 106 |   2 |  53 |    2 |
|   3 |     |     | 106 |   1 |  30 |    6 |
|   4 |   3 |     | 106 |   2 |  24 |    5 |
|   5 |     |     | 106 |   1 |  22 |    4 |
|   6 |     |     | 106 |   1 |  17 |    3 |
|   7 |     |     | 106 |   2 |  10 |    1 |
|   8 |   1 |  22 | 106 |   2 |  23 |    4 |
|   9 |     |     | 106 |   2 |   0 |      |
+-----+-----+-----+-----+-----+-----+------+

一个特别大的家庭：

    input
HL1 MA1 MA2 hid     HL4 HL6 ED4A
1   .   .   365809  1   33  1
2   1   33  365809  2   26  1
1   .   .   365810  1   58  1
2   .   .   365810  2   54  .
3   .   .   365810  1   23  3
4   .   .   365810  1   23  2
5   .   .   365810  1   18  3
6   .   .   365810  1   15  2
7   .   .   365810  2   12  2
8   .   .   365810  1   33  3
9   1   dk  365810  2   31  1
10  .   .   365810  2   13  2
11  .   .   365810  2   11  1
12  .   .   365810  1   9   1
13  .   .   365810  1   6   1
14  .   .   365810  2   3   .
15  .   .   365810  1   2   .
16  .   .   365810  1   33  3
17  1   33  365810  2   30  1
18  .   .   365810  1   8   1
19  .   .   365810  2   6   1
20  .   .   365810  2   5   .
21  .   .   365810  1   1   .
22  .   .   365810  1   32  4
23  1   32  365810  2   30  1
24  .   .   365810  1   5   .
25  .   .   365810  2   3   .
26  .   .   365810  1   2   .
27  .   .   365810  1   30  4
28  1   30  365810  2   28  1
29  .   .   365810  2   2   .
30  .   .   365810  1   0   .
31  .   .   365810  1   27  2
32  1   27  365810  2   27  1
33  .   .   365810  2   2   .
34  .   .   365810  2   0   .
         end

Answer 1

由于您已经概述了执行所需操作的必要步骤，为此编写一个简单的脚本应该没有问题。根据我的经验，如果您 write/do 分别执行每个步骤（并查看每个步骤之后发生的情况，是否引入任何错误等），则更容易学习语法。掌握了它之后，您可以将代码精简到一行。这样的事情应该可行（尝试按照您问题中的步骤操作）：

*look at wife currently married
*not necessary, as only married women have MA2, but next step takes only married women into account

* generate husbands age variable and spread to whole household (new var to keep original MA2 untouched)
gen husband_age=MA2 if MA1==married & HL4==woman
bys hid: egen husband_age_hid=max(husband_age)

*mark which individual is the husband (assumed this is what was meant by pairing age of husband with age of male in household)
gen husband=0
bys hid: replace husband = 1 if husband_age_hid == HL6

*copy husbands education information to the whole household
gen husband_ED4 = ED4 if husband==1
bys hid: egen husb_educ=max(husband_ED4)

*data cleaning, if necessary
drop husband*

在第一步中使用 tempvars 而不是生成新变量可能会更好，但认为这些变量稍后可能会有用。

Answer 2

这是一个开始。该代码确实会遍历每个家庭中的不同已婚妇女，但如果有两个或更多男人与丈夫的年龄相匹配，它就不会做任何事情。

input  HL1  MA1  MA2  hid  HL4  HL6  ED4A 
  1    .   .     106    1   57     4 
  2    .   .     106    2   53     2 
  3    .   .     106    1   30     6 
  4    3   .     106    2   24     5 
  5    .   .     106    1   22     4 
  6    .   .     106    1   17     3 
  7    .   .     106    2   10     1 
  8    1  22     106    2   23     4 
  9    .   .     106    2    0     .    
 end 

bysort hid (MA1) : gen wid = _n if MA1 == 1 

su wid, meanonly 

local max = r(max) 

gen heducation = . 

quietly forval i = 1/`max' { 
    bysort hid : egen hage = min(cond(wid == `i', MA2, .)) 
    by hid : egen nmatches = total(HL4 == 1 & HL6 == hage) 
    by hid : egen work = min(cond(nmatches == 1 & HL6 == hage, ED4, .)) 
    replace heducation = work if wid == `i' 
    drop hage nmatches work 
}

sort hid HL1 

list 

     +-----------------------------------------------------------+
     | HL1   MA1   MA2   hid   HL4   HL6   ED4A   wid   heduca~n |
     |-----------------------------------------------------------|
  1. |   1     .     .   106     1    57      4     .          . |
  2. |   2     .     .   106     2    53      2     .          . |
  3. |   3     .     .   106     1    30      6     .          . |
  4. |   4     3     .   106     2    24      5     .          . |
  5. |   5     .     .   106     1    22      4     .          . |
     |-----------------------------------------------------------|
  6. |   6     .     .   106     1    17      3     .          . |
  7. |   7     .     .   106     2    10      1     .          . |
  8. |   8     1    22   106     2    23      4     1          4 |
  9. |   9     .     .   106     2     0      .     .          . |
     +-----------------------------------------------------------+

（更新）

扩展示例发现一个错误：一次计算不够严格，没有排除同龄女性。（顺便说一句，请注意新数据是针对两个家庭，而不是一个。）

bysort hid (MA1) : gen wid = _n if MA1 == 1 

su wid, meanonly 

local max = r(max) 

gen heducation = . 

quietly forval i = 1/`max' { 
    bysort hid : egen hage`i' = min(cond(wid == `i', MA2, .)) 
    by hid : egen nmatches`i' = total(HL4 == 1 & HL6 == hage`i') 
    by hid : egen work`i' = min(cond(nmatches`i' == 1 & HL6 == hage`i' & HL4 == 1, ED4, .)) 
    replace heducation = work`i' if wid == `i' 
}

sort hid wid HL1 

    list hid wid MA2 HL6 ED4 heducation HL4 if inlist(HL6, 27, 30, 32, 33) | MA2 < ., sepby(hid) 

     +--------------------------------------------------+
     |    hid   wid   MA2   HL6   ED4A   heduca~n   HL4 |
     |--------------------------------------------------|
  1. | 365809     1    33    26      1          1     2 |
  2. | 365809     .     .    33      1          .     1 |
     |--------------------------------------------------|
  3. | 365810     1    27    27      1          2     2 |
  4. | 365810     2    33    30      1          .     2 |
  5. | 365810     3    32    30      1          4     2 |
  6. | 365810     4    30    28      1          4     2 |
 14. | 365810     .     .    33      3          .     1 |
 21. | 365810     .     .    33      3          .     1 |
 26. | 365810     .     .    32      4          .     1 |
 30. | 365810     .     .    30      4          .     1 |
 33. | 365810     .     .    27      2          .     1 |
     +--------------------------------------------------+

有关更一般性的讨论，请参阅

here or here or here。

Stata 查询：需要帮助创建一个新变量，该变量依赖于同一家庭中不同行的数据

Stata query: Need help creating a new variable dependent on data from a different row within same household

stata