从单独的代码重新编码 table
Recoding from separate code table
我有如下数据集:
dat1 <- read.table(header=TRUE, text="
ID Age Align Weat
8645 15-24 A 1
6228 15-24 B 1
5830 15-24 A 3
1844 25-34 B 1
4461 35-44 B 2
2119 35-44 C 2
2115 45-54 A 1
")
dat1
ID Age Align Weat
1 8645 15-24 A 1
2 6228 15-24 B 1
3 5830 15-24 A 3
4 1844 25-34 B 1
5 4461 35-44 B 2
6 2119 35-44 C 2
7 2115 45-54 A 1
列Age
、Align
和Weat
的属性在代码数据框中描述:
dat2 <- read.table(header=TRUE, text="
Code Desc Column
15-24 Young Age
25-34 Young Age
35-44 Middle Age
45-54 Middle Age
A Straight Align
B Curve Align
C Hill Align
1 Clear Weat
2 Cloudy Weat
3 Rain Weat
")
dat2
Code Desc Column
1 15-24 Young Age
2 25-34 Young Age
3 35-44 Middle Age
4 45-54 Middle Age
5 A Straight Align
6 B Curve Align
7 C Hill Align
8 1 Clear Weat
9 2 Cloudy Weat
10 3 Rain Weat
我想匹配代码数据框以获取我的数据集,如下所示:
ID Age Align Weat
1 8645 Young Straight Clear
2 6228 Young Curve Clear
3 5830 Young Straight Rain
4 1844 Young Curve Clear
5 4461 Middle Curve Cloudy
6 2119 Middle Hill Cloudy
7 2115 Middle Straight Clear
我目前正在使用以下代码来完成我的任务,这对于具有 500 列的大型数据集和这些列的代码 table 来说效率不高。
age <- subset(dat2, Column=="Age")
age
Code Desc Column
1 15-24 Young Age
2 25-34 Young Age
3 35-44 Middle Age
4 45-54 Middle Age
align <- subset(dat2, Column=="Align")
align
Code Desc Column
5 A Straight Align
6 B Curve Align
7 C Hill Align
weat <- subset(dat2, Column=="Weat")
weat
Code Desc Column
8 1 Clear Weat
9 2 Cloudy Weat
10 3 Rain Weat
dat1$Age <- age$Desc[match(dat1$Age, age$Code)]
dat1$Align <- align$Desc[match(dat1$Align, align$Code)]
dat1$Weat <- weat$Desc[match(dat1$Weat, weat$Code)]
dat1
ID Age Align Weat
1 8645 Young Straight Clear
2 6228 Young Curve Clear
3 5830 Young Straight Rain
4 1844 Young Curve Clear
5 4461 Middle Curve Cloudy
6 2119 Middle Hill Cloudy
7 2115 Middle Straight Clear
尝试一个简单的 for 循环:
varnames <- unique(dat2$Column)
dat3 <- dat1
for (i in varnames)
{ startvars <- names(dat3)[!names(dat3) %in% i]
dat3 <- merge(dat3, subset(dat2, Column==i),
by.x=i, by.y="Code")[,c(startvars, "Desc")]
colnames(dat3)[names(dat3) %in% "Desc"] <- i
}
结果:
ID Age Align Weat
1 8645 Young Straight Clear
2 2115 Middle Straight Clear
3 6228 Young Curve Clear
4 1844 Young Curve Clear
5 4461 Middle Curve Cloudy
6 2119 Middle Hill Cloudy
7 5830 Young Straight Rain
这显然不是超级有效,一个 data.table 解决方案和一些 dcast 可能是合适的,但我会留给其他人去思考。
PS:必须通过将 stringsAsFactors= F, colClasses= rep("character",4))
添加到 read.table
来稍微重新格式化第一个数据集
您可以对 dat1
中的变量使用 for
循环:
# 'intersect' is needed to recode only those columns which have description
for (each_column in intersect(colnames(dat1), dat2$Column)){
curr_dict = dat2$Column %in% each_column
code = dat2$Code[curr_dict]
descr = dat2$Desc[curr_dict]
dat1[[each_column]] = descr[match(dat1[[each_column]], code)]
}
我有如下数据集:
dat1 <- read.table(header=TRUE, text="
ID Age Align Weat
8645 15-24 A 1
6228 15-24 B 1
5830 15-24 A 3
1844 25-34 B 1
4461 35-44 B 2
2119 35-44 C 2
2115 45-54 A 1
")
dat1
ID Age Align Weat
1 8645 15-24 A 1
2 6228 15-24 B 1
3 5830 15-24 A 3
4 1844 25-34 B 1
5 4461 35-44 B 2
6 2119 35-44 C 2
7 2115 45-54 A 1
列Age
、Align
和Weat
的属性在代码数据框中描述:
dat2 <- read.table(header=TRUE, text="
Code Desc Column
15-24 Young Age
25-34 Young Age
35-44 Middle Age
45-54 Middle Age
A Straight Align
B Curve Align
C Hill Align
1 Clear Weat
2 Cloudy Weat
3 Rain Weat
")
dat2
Code Desc Column
1 15-24 Young Age
2 25-34 Young Age
3 35-44 Middle Age
4 45-54 Middle Age
5 A Straight Align
6 B Curve Align
7 C Hill Align
8 1 Clear Weat
9 2 Cloudy Weat
10 3 Rain Weat
我想匹配代码数据框以获取我的数据集,如下所示:
ID Age Align Weat
1 8645 Young Straight Clear
2 6228 Young Curve Clear
3 5830 Young Straight Rain
4 1844 Young Curve Clear
5 4461 Middle Curve Cloudy
6 2119 Middle Hill Cloudy
7 2115 Middle Straight Clear
我目前正在使用以下代码来完成我的任务,这对于具有 500 列的大型数据集和这些列的代码 table 来说效率不高。
age <- subset(dat2, Column=="Age")
age
Code Desc Column
1 15-24 Young Age
2 25-34 Young Age
3 35-44 Middle Age
4 45-54 Middle Age
align <- subset(dat2, Column=="Align")
align
Code Desc Column
5 A Straight Align
6 B Curve Align
7 C Hill Align
weat <- subset(dat2, Column=="Weat")
weat
Code Desc Column
8 1 Clear Weat
9 2 Cloudy Weat
10 3 Rain Weat
dat1$Age <- age$Desc[match(dat1$Age, age$Code)]
dat1$Align <- align$Desc[match(dat1$Align, align$Code)]
dat1$Weat <- weat$Desc[match(dat1$Weat, weat$Code)]
dat1
ID Age Align Weat
1 8645 Young Straight Clear
2 6228 Young Curve Clear
3 5830 Young Straight Rain
4 1844 Young Curve Clear
5 4461 Middle Curve Cloudy
6 2119 Middle Hill Cloudy
7 2115 Middle Straight Clear
尝试一个简单的 for 循环:
varnames <- unique(dat2$Column)
dat3 <- dat1
for (i in varnames)
{ startvars <- names(dat3)[!names(dat3) %in% i]
dat3 <- merge(dat3, subset(dat2, Column==i),
by.x=i, by.y="Code")[,c(startvars, "Desc")]
colnames(dat3)[names(dat3) %in% "Desc"] <- i
}
结果:
ID Age Align Weat
1 8645 Young Straight Clear
2 2115 Middle Straight Clear
3 6228 Young Curve Clear
4 1844 Young Curve Clear
5 4461 Middle Curve Cloudy
6 2119 Middle Hill Cloudy
7 5830 Young Straight Rain
这显然不是超级有效,一个 data.table 解决方案和一些 dcast 可能是合适的,但我会留给其他人去思考。
PS:必须通过将 stringsAsFactors= F, colClasses= rep("character",4))
添加到 read.table
您可以对 dat1
中的变量使用 for
循环:
# 'intersect' is needed to recode only those columns which have description
for (each_column in intersect(colnames(dat1), dat2$Column)){
curr_dict = dat2$Column %in% each_column
code = dat2$Code[curr_dict]
descr = dat2$Desc[curr_dict]
dat1[[each_column]] = descr[match(dat1[[each_column]], code)]
}