基于数据子集创建新的分类变量
Create new Categorical variable based on a subset of data
我有一个如下所示的数据框:
cnt bnk qst ans
1 Country 1 Bank 1 q1 1
2 Country 2 Bank 2 q1 1
3 Country 3 Bank 3 q1 3
4 Country 4 Bank 4 q1 1
5 Country 1 Bank 1 q2 1
6 Country 2 Bank 2 q2 2
7 Country 3 Bank 3 q2 3
8 Country 4 Bank 4 q2 4
9 Country 1 Bank 1 q3 1
10 Country 2 Bank 2 q3 1
11 Country 3 Bank 3 q3 2
12 Country 4 Bank 4 q3 1
请注意,q
代表 "Question"。所以,q2
是 "Question 2"。同样,ans
是响应。
现在,我想根据 q2
中的响应创建一个分类变量。特别是我想分配以下类别:
- Public
- 私人
- 混合
- 其他
所以,如果 ans=1
到 qst=q2
,这是 "Public",如果 ans=2
到 qst=q2
,这是 "Private" 等等。所以,我的数据框应该是这样的:
cnt bnk qst ans dummy
1 Country 1 Bank 1 q1 1 Public
2 Country 2 Bank 2 q1 1 Private
3 Country 3 Bank 3 q1 3 Mixed
4 Country 4 Bank 4 q1 1 Other'
5 Country 1 Bank 1 q2 1 Public
6 Country 2 Bank 2 q2 2 Private
7 Country 3 Bank 3 q2 3 Mixed
8 Country 4 Bank 4 q2 4 Other'
9 Country 1 Bank 1 q3 1 Public
10 Country 2 Bank 2 q3 1 Private
11 Country 3 Bank 3 q3 2 Mixed
12 Country 4 Bank 4 q3 1 Other'
我尝试使用 ifelse,但未能如愿。有人可以给我一些建议吗?
数据
dput(df)
structure(list(cnt = c("Country 1", "Country 2", "Country 3",
"Country 4", "Country 1", "Country 2", "Country 3", "Country 4",
"Country 1", "Country 2", "Country 3", "Country 4"), bnk = c("Bank 1",
"Bank 2", "Bank 3", "Bank 4", "Bank 1", "Bank 2", "Bank 3", "Bank 4",
"Bank 1", "Bank 2", "Bank 3", "Bank 4"), qst = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("q1",
"q2", "q3"), class = "factor"), ans = c(1L, 1L, 3L, 1L, 1L, 2L,
3L, 4L, 1L, 1L, 2L, 1L), dummy = c(NA, NA, NA, NA, "Public",
"Private", "Mixed", "Other", NA, NA, NA, NA)), .Names = c("cnt",
"bnk", "qst", "ans", "dummy"), row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")
下面将把NA
用于所有其他Q,
df$dummy <- ifelse(df$ans == 1 & df$qst == 'q2', 'Public',
ifelse(df$ans == 2 & df$qst == 'q2', 'Private',
ifelse(df$ans == 3 & df$qst == 'q2', 'Mixed',
ifelse(df$ans == 4 & df$qst == 'q2', 'Other', NA))))
# cnt bnk qst ans dummy
#1 Country 1 Bank 1 q1 1 <NA>
#2 Country 2 Bank 2 q1 1 <NA>
#3 Country 3 Bank 3 q1 3 <NA>
#4 Country 4 Bank 4 q1 1 <NA>
#5 Country 1 Bank 1 q2 1 Public
#6 Country 2 Bank 2 q2 2 Private
#7 Country 3 Bank 3 q2 3 Mixed
#8 Country 4 Bank 4 q2 4 Other
#9 Country 1 Bank 1 q3 1 <NA>
#10 Country 2 Bank 2 q3 1 <NA>
#11 Country 3 Bank 3 q3 2 <NA>
#12 Country 4 Bank 4 q3 1 <NA>
类似下面的内容将适用于名为 df 的 data.frame。没有数据很难测试:
# construct dummy variable in subset data.frame
dfCountryQ2 <- df[df$qst=="q2", c("cnt", "ans")]
dfCountryQ2$dummy <- factor(dfCountryQ2$ans, levels=1:4,
labels=c("Public", "Private", "Mixed", "Other"))
# merge on by country
df <- merge(df, dfCountryQ2[, c("cnt", "dummy")], by="cnt")
我有一个如下所示的数据框:
cnt bnk qst ans
1 Country 1 Bank 1 q1 1
2 Country 2 Bank 2 q1 1
3 Country 3 Bank 3 q1 3
4 Country 4 Bank 4 q1 1
5 Country 1 Bank 1 q2 1
6 Country 2 Bank 2 q2 2
7 Country 3 Bank 3 q2 3
8 Country 4 Bank 4 q2 4
9 Country 1 Bank 1 q3 1
10 Country 2 Bank 2 q3 1
11 Country 3 Bank 3 q3 2
12 Country 4 Bank 4 q3 1
请注意,q
代表 "Question"。所以,q2
是 "Question 2"。同样,ans
是响应。
现在,我想根据 q2
中的响应创建一个分类变量。特别是我想分配以下类别:
- Public
- 私人
- 混合
- 其他
所以,如果 ans=1
到 qst=q2
,这是 "Public",如果 ans=2
到 qst=q2
,这是 "Private" 等等。所以,我的数据框应该是这样的:
cnt bnk qst ans dummy
1 Country 1 Bank 1 q1 1 Public
2 Country 2 Bank 2 q1 1 Private
3 Country 3 Bank 3 q1 3 Mixed
4 Country 4 Bank 4 q1 1 Other'
5 Country 1 Bank 1 q2 1 Public
6 Country 2 Bank 2 q2 2 Private
7 Country 3 Bank 3 q2 3 Mixed
8 Country 4 Bank 4 q2 4 Other'
9 Country 1 Bank 1 q3 1 Public
10 Country 2 Bank 2 q3 1 Private
11 Country 3 Bank 3 q3 2 Mixed
12 Country 4 Bank 4 q3 1 Other'
我尝试使用 ifelse,但未能如愿。有人可以给我一些建议吗?
数据
dput(df)
structure(list(cnt = c("Country 1", "Country 2", "Country 3",
"Country 4", "Country 1", "Country 2", "Country 3", "Country 4",
"Country 1", "Country 2", "Country 3", "Country 4"), bnk = c("Bank 1",
"Bank 2", "Bank 3", "Bank 4", "Bank 1", "Bank 2", "Bank 3", "Bank 4",
"Bank 1", "Bank 2", "Bank 3", "Bank 4"), qst = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("q1",
"q2", "q3"), class = "factor"), ans = c(1L, 1L, 3L, 1L, 1L, 2L,
3L, 4L, 1L, 1L, 2L, 1L), dummy = c(NA, NA, NA, NA, "Public",
"Private", "Mixed", "Other", NA, NA, NA, NA)), .Names = c("cnt",
"bnk", "qst", "ans", "dummy"), row.names = c("1", "2", "3", "4",
"5", "6", "7", "8", "9", "10", "11", "12"), class = "data.frame")
下面将把NA
用于所有其他Q,
df$dummy <- ifelse(df$ans == 1 & df$qst == 'q2', 'Public',
ifelse(df$ans == 2 & df$qst == 'q2', 'Private',
ifelse(df$ans == 3 & df$qst == 'q2', 'Mixed',
ifelse(df$ans == 4 & df$qst == 'q2', 'Other', NA))))
# cnt bnk qst ans dummy
#1 Country 1 Bank 1 q1 1 <NA>
#2 Country 2 Bank 2 q1 1 <NA>
#3 Country 3 Bank 3 q1 3 <NA>
#4 Country 4 Bank 4 q1 1 <NA>
#5 Country 1 Bank 1 q2 1 Public
#6 Country 2 Bank 2 q2 2 Private
#7 Country 3 Bank 3 q2 3 Mixed
#8 Country 4 Bank 4 q2 4 Other
#9 Country 1 Bank 1 q3 1 <NA>
#10 Country 2 Bank 2 q3 1 <NA>
#11 Country 3 Bank 3 q3 2 <NA>
#12 Country 4 Bank 4 q3 1 <NA>
类似下面的内容将适用于名为 df 的 data.frame。没有数据很难测试:
# construct dummy variable in subset data.frame
dfCountryQ2 <- df[df$qst=="q2", c("cnt", "ans")]
dfCountryQ2$dummy <- factor(dfCountryQ2$ans, levels=1:4,
labels=c("Public", "Private", "Mixed", "Other"))
# merge on by country
df <- merge(df, dfCountryQ2[, c("cnt", "dummy")], by="cnt")