如何基于另一个不同长度的数据集在 R 中创建变量
How to create a variable in R based on another dataset of different length
我正在尝试创建一个变量 STATE
,它出现在另一个长度与我不同的数据集中。
两个对象都有一个状态编码变量GESTFIPS
。所以,我只想让 R 检查 GESTFIPS
是否匹配,然后相应地在我的数据集中创建变量 STATE
。
我试过了:
> state_1865_base$STATE[state_1865_base$GESTFIPS==urate2$GESTFIPS] < -
+ urate2$STATE[state_1865_base$GESTFIPS==urate2$GESTFIPS]
并收到错误信息:
Error in -urate2$STATE[state_1865_base$GESTFIPS == urate2$GESTFIPS] :
invalid argument to unary operator
In addition: Warning messages:
1: In state_1865_base$GESTFIPS == urate2$GESTFIPS :
longer object length is not a multiple of shorter object length
2: In state_1865_base$GESTFIPS == urate2$GESTFIPS :
longer object length is not a multiple of shorter object length
我的数据集看起来像(132990 obs. of 117 variables):
data.frame': 132990 obs. of 117 variables:
$ IDENTIFIER : chr "20030100013280" "20030100013344" "20030100013352" "20030100013848" ...
$ AGE : num 60 41 26 36 51 32 44 21 33 39 ...
$ MALE : num 1 0 0 0 1 0 0 0 0 0 ...
$ BLACK : num 1 0 0 1 0 0 0 0 0 1 ...
$ MARRIED : num 1 1 1 1 1 0 1 0 1 1 ...
$ NUM_CHILD : num 0 2 0 2 2 1 1 1 3 4 ...
$ HV_CHILD : num 0 1 0 1 1 1 1 1 1 1 ...
$ AGE_YOUNGEST : num NA 0 NA 9 14 2 9 14 3 4 ...
$ CHILD_4 : num 0 1 0 0 0 1 0 0 1 0 ...
$ CHILD_5 : num 0 1 0 0 0 1 0 0 1 1 ...
$ GRADE : num 17 13 13 12 17 16 12 13 13 13 ...
$ SPOUSE_EMP : num 0 1 0 1 0 1 1 NA 1 0 ...
$ SPOUSE_WORKHOURS : num NA 50 NA 40 NA 40 50 NA 40 NA ...
$ WORKING : num 1 1 1 0 1 1 1 1 1 1 ...
$ UNEMP : num 0 0 0 1 0 0 0 0 0 0 ...
$ RETIRED : num 0 0 0 0 0 0 0 0 0 0 ...
$ DISABLED : num 0 0 0 0 0 0 0 0 0 0 ...
$ STUDENT : num 0 0 0 0 0 0 0 0 0 0 ...
$ HOMEMAKER : num 0 0 0 0 0 0 0 0 0 0 ...
$ WORK_PART : num 1 1 1 0 0 0 0 0 0 0 ...
$ HH_INCOME_03 : num 660 200 200 NA NA ...
$ WAGE_03 : num 22 6.67 16.67 NA NA ...
$ WAGE_03_ALT : num 22 NA 12.5 NA NA NA NA 9.5 14 12 ...
$ YEAR : num 2003 2003 2003 2003 2003 ...
$ DATASET : num 2003 2003 2003 2003 2003 ...
$ INTERVIEW_DAY : num 5 6 6 4 4 4 1 2 6 4 ...
$ INTERVIEW_DATE : Date, format: "2003-01-03" "2003-01-04" "2003-01-04" "2003-01-02" ...
$ GESTFIPS : num 6 6 6 13 21 21 22 26 27 34 ...
[list output truncated]
这是存储状态的数据集 urate
。 (6 个变量的 204 个观测值)
STATE GESTFIPS NOBS TWOYEAR UNEMP URATE
AL 1 434 1 0.05392952 5.19585
AL 1 288 2 0.02666941 3.63750
AL 1 266 3 0.03848163 4.24585
AL 1 248 4 0.11545039 9.59580
AK 2 62 1 0.07917716 7.52915
AK 2 41 2 0.12782212 6.70415
AK 2 38 3 0.00000000 6.25835
state_1865_base$STATE <- urate2$STATE[match(state_1865_base$GESTFIPS, urate2$GESTFIPS)]
应该可以。
编辑:我原来的错误答案是
It looks as though you are using < -
for assignment. If you use <-
instead, I think that your code will work.
我正在尝试创建一个变量 STATE
,它出现在另一个长度与我不同的数据集中。
两个对象都有一个状态编码变量GESTFIPS
。所以,我只想让 R 检查 GESTFIPS
是否匹配,然后相应地在我的数据集中创建变量 STATE
。
我试过了:
> state_1865_base$STATE[state_1865_base$GESTFIPS==urate2$GESTFIPS] < -
+ urate2$STATE[state_1865_base$GESTFIPS==urate2$GESTFIPS]
并收到错误信息:
Error in -urate2$STATE[state_1865_base$GESTFIPS == urate2$GESTFIPS] :
invalid argument to unary operator
In addition: Warning messages:
1: In state_1865_base$GESTFIPS == urate2$GESTFIPS :
longer object length is not a multiple of shorter object length
2: In state_1865_base$GESTFIPS == urate2$GESTFIPS :
longer object length is not a multiple of shorter object length
我的数据集看起来像(132990 obs. of 117 variables):
data.frame': 132990 obs. of 117 variables:
$ IDENTIFIER : chr "20030100013280" "20030100013344" "20030100013352" "20030100013848" ...
$ AGE : num 60 41 26 36 51 32 44 21 33 39 ...
$ MALE : num 1 0 0 0 1 0 0 0 0 0 ...
$ BLACK : num 1 0 0 1 0 0 0 0 0 1 ...
$ MARRIED : num 1 1 1 1 1 0 1 0 1 1 ...
$ NUM_CHILD : num 0 2 0 2 2 1 1 1 3 4 ...
$ HV_CHILD : num 0 1 0 1 1 1 1 1 1 1 ...
$ AGE_YOUNGEST : num NA 0 NA 9 14 2 9 14 3 4 ...
$ CHILD_4 : num 0 1 0 0 0 1 0 0 1 0 ...
$ CHILD_5 : num 0 1 0 0 0 1 0 0 1 1 ...
$ GRADE : num 17 13 13 12 17 16 12 13 13 13 ...
$ SPOUSE_EMP : num 0 1 0 1 0 1 1 NA 1 0 ...
$ SPOUSE_WORKHOURS : num NA 50 NA 40 NA 40 50 NA 40 NA ...
$ WORKING : num 1 1 1 0 1 1 1 1 1 1 ...
$ UNEMP : num 0 0 0 1 0 0 0 0 0 0 ...
$ RETIRED : num 0 0 0 0 0 0 0 0 0 0 ...
$ DISABLED : num 0 0 0 0 0 0 0 0 0 0 ...
$ STUDENT : num 0 0 0 0 0 0 0 0 0 0 ...
$ HOMEMAKER : num 0 0 0 0 0 0 0 0 0 0 ...
$ WORK_PART : num 1 1 1 0 0 0 0 0 0 0 ...
$ HH_INCOME_03 : num 660 200 200 NA NA ...
$ WAGE_03 : num 22 6.67 16.67 NA NA ...
$ WAGE_03_ALT : num 22 NA 12.5 NA NA NA NA 9.5 14 12 ...
$ YEAR : num 2003 2003 2003 2003 2003 ...
$ DATASET : num 2003 2003 2003 2003 2003 ...
$ INTERVIEW_DAY : num 5 6 6 4 4 4 1 2 6 4 ...
$ INTERVIEW_DATE : Date, format: "2003-01-03" "2003-01-04" "2003-01-04" "2003-01-02" ...
$ GESTFIPS : num 6 6 6 13 21 21 22 26 27 34 ...
[list output truncated]
这是存储状态的数据集 urate
。 (6 个变量的 204 个观测值)
STATE GESTFIPS NOBS TWOYEAR UNEMP URATE
AL 1 434 1 0.05392952 5.19585
AL 1 288 2 0.02666941 3.63750
AL 1 266 3 0.03848163 4.24585
AL 1 248 4 0.11545039 9.59580
AK 2 62 1 0.07917716 7.52915
AK 2 41 2 0.12782212 6.70415
AK 2 38 3 0.00000000 6.25835
state_1865_base$STATE <- urate2$STATE[match(state_1865_base$GESTFIPS, urate2$GESTFIPS)]
应该可以。
编辑:我原来的错误答案是
It looks as though you are using
< -
for assignment. If you use<-
instead, I think that your code will work.