1 和 0 的列表示个人在一系列备选方案中的选择?
Column of 1's and 0's to indicate individuals' choices across a range of alternatives?
我正在尝试设置我的数据以在 R 中的 mlogit 包中工作
我有一个使用以下代码创建的数据框:
id <- 1:10
id <- rep(id, each=5)
site <- c("site1", "site2", "site3", "site4", "site5")
choice <- c("site3", "site5", "site1", "site4", "site2",
"site4", "site3", "site5", "site2", "site1")
df <- cbind(id, site)
我想创建一个二进制变量来指示每个 id 值的站点选择。由于 id 变量是一个重复序列,新的指示变量需要为每一行都为 0,除了 "site" 等于 "choice" 的相关值的那一行。对于 id == 1,这将是 "choice" 向量的第一个元素。对于 id == 2,它将是选择向量的第二个元素,依此类推。
包含变量的最终数据框应如下所示:
id site indicator
[1,] "1" "site1" "0"
[2,] "1" "site2" "0"
[3,] "1" "site3" "1"
[4,] "1" "site4" "0"
[5,] "1" "site5" "0"
[6,] "2" "site1" "0"
[7,] "2" "site2" "0"
[8,] "2" "site3" "0"
[9,] "2" "site4" "0"
[10,] "2" "site5" "1"
[11,] "3" "site1" "1"
[12,] "3" "site2" "0"
[13,] "3" "site3" "0"
[14,] "3" "site4" "0"
[15,] "3" "site5" "0"
[16,] "4" "site1" "0"
[17,] "4" "site2" "0"
[18,] "4" "site3" "0"
[19,] "4" "site4" "1"
[20,] "4" "site5" "0"
[21,] "5" "site1" "0"
[22,] "5" "site2" "1"
[23,] "5" "site3" "0"
[24,] "5" "site4" "0"
[25,] "5" "site5" "0"
[26,] "6" "site1" "0"
[27,] "6" "site2" "0"
[28,] "6" "site3" "0"
[29,] "6" "site4" "1"
[30,] "6" "site5" "0"
[31,] "7" "site1" "0"
[32,] "7" "site2" "0"
[33,] "7" "site3" "1"
[34,] "7" "site4" "0"
[35,] "7" "site5" "0"
[36,] "8" "site1" "0"
[37,] "8" "site2" "0"
[38,] "8" "site3" "0"
[39,] "8" "site4" "0"
[40,] "8" "site5" "1"
[41,] "9" "site1" "0"
[42,] "9" "site2" "1"
[43,] "9" "site3" "0"
[44,] "9" "site4" "0"
[45,] "9" "site5" "0"
[46,] "10" "site1" "1"
[47,] "10" "site2" "0"
[48,] "10" "site3" "0"
[49,] "10" "site4" "0"
[50,] "10" "site5" "0"
这个我试了很多次都搞不定,网上也找不到相关的答案。
提前致谢:)
我们可以split
通过'id'得到'site',然后用Map
与'choice'中对应的值进行比较得到一个逻辑索引。
df$indicator <- +(unlist(Map(`==`, split(df$site, df$id), choice), use.names=FALSE))
或用tabulate
得到'id'的频率,复制'choice',与'site'比较并转换为二进制
+(rep(choice,tabulate(df$id))==df$site)
数据
df <- data.frame(id, site)
正如 Akrun 所建议的,使用 data.frame
来定义 df:
df <- data.frame(id, site)
然后做:
df$indicator <- (df$site == choice[df$id])*1
*1
会将 T/F 结果转换为 1 和 0
结果:
id site indicator
1 1 site1 0
2 1 site2 0
3 1 site3 1
4 1 site4 0
5 1 site5 0
6 2 site1 0
7 2 site2 0
8 2 site3 0
9 2 site4 0
10 2 site5 1
11 3 site1 1
12 3 site2 0
13 3 site3 0
14 3 site4 0
15 3 site5 0
16 4 site1 0
17 4 site2 0
18 4 site3 0
19 4 site4 1
20 4 site5 0
21 5 site1 0
22 5 site2 1
23 5 site3 0
24 5 site4 0
25 5 site5 0
26 6 site1 0
27 6 site2 0
28 6 site3 0
29 6 site4 1
30 6 site5 0
31 7 site1 0
32 7 site2 0
33 7 site3 1
34 7 site4 0
35 7 site5 0
36 8 site1 0
37 8 site2 0
38 8 site3 0
39 8 site4 0
40 8 site5 1
41 9 site1 0
42 9 site2 1
43 9 site3 0
44 9 site4 0
45 9 site5 0
46 10 site1 1
47 10 site2 0
48 10 site3 0
49 10 site4 0
50 10 site5 0
如果您想要字符串而不是数字或因子,请在要转换的列上使用 as.character
使用给定矩阵 (df),指标 可以计算为:
indicator <- as.numeric(choice[as.numeric(df[,"id"])] == df[,"site"])
# Final matrix
df <- cbind(df,indicator)
所需矩阵:
id site indicator
[1,] "1" "site1" "0"
[2,] "1" "site2" "0"
[3,] "1" "site3" "1"
[4,] "1" "site4" "0"
[5,] "1" "site5" "0"
[6,] "2" "site1" "0"
[7,] "2" "site2" "0"
[8,] "2" "site3" "0"
[9,] "2" "site4" "0"
[10,] "2" "site5" "1"
[11,] "3" "site1" "1"
[12,] "3" "site2" "0"
[13,] "3" "site3" "0"
[14,] "3" "site4" "0"
[15,] "3" "site5" "0"
[16,] "4" "site1" "0"
[17,] "4" "site2" "0"
[18,] "4" "site3" "0"
[19,] "4" "site4" "1"
[20,] "4" "site5" "0"
[21,] "5" "site1" "0"
[22,] "5" "site2" "1"
[23,] "5" "site3" "0"
[24,] "5" "site4" "0"
[25,] "5" "site5" "0"
[26,] "6" "site1" "0"
[27,] "6" "site2" "0"
[28,] "6" "site3" "0"
[29,] "6" "site4" "1"
[30,] "6" "site5" "0"
[31,] "7" "site1" "0"
[32,] "7" "site2" "0"
[33,] "7" "site3" "1"
[34,] "7" "site4" "0"
[35,] "7" "site5" "0"
[36,] "8" "site1" "0"
[37,] "8" "site2" "0"
[38,] "8" "site3" "0"
[39,] "8" "site4" "0"
[40,] "8" "site5" "1"
[41,] "9" "site1" "0"
[42,] "9" "site2" "1"
[43,] "9" "site3" "0"
[44,] "9" "site4" "0"
[45,] "9" "site5" "0"
[46,] "10" "site1" "1"
[47,] "10" "site2" "0"
[48,] "10" "site3" "0"
[49,] "10" "site4" "0"
[50,] "10" "site5" "0"
我正在尝试设置我的数据以在 R 中的 mlogit 包中工作
我有一个使用以下代码创建的数据框:
id <- 1:10
id <- rep(id, each=5)
site <- c("site1", "site2", "site3", "site4", "site5")
choice <- c("site3", "site5", "site1", "site4", "site2",
"site4", "site3", "site5", "site2", "site1")
df <- cbind(id, site)
我想创建一个二进制变量来指示每个 id 值的站点选择。由于 id 变量是一个重复序列,新的指示变量需要为每一行都为 0,除了 "site" 等于 "choice" 的相关值的那一行。对于 id == 1,这将是 "choice" 向量的第一个元素。对于 id == 2,它将是选择向量的第二个元素,依此类推。
包含变量的最终数据框应如下所示:
id site indicator
[1,] "1" "site1" "0"
[2,] "1" "site2" "0"
[3,] "1" "site3" "1"
[4,] "1" "site4" "0"
[5,] "1" "site5" "0"
[6,] "2" "site1" "0"
[7,] "2" "site2" "0"
[8,] "2" "site3" "0"
[9,] "2" "site4" "0"
[10,] "2" "site5" "1"
[11,] "3" "site1" "1"
[12,] "3" "site2" "0"
[13,] "3" "site3" "0"
[14,] "3" "site4" "0"
[15,] "3" "site5" "0"
[16,] "4" "site1" "0"
[17,] "4" "site2" "0"
[18,] "4" "site3" "0"
[19,] "4" "site4" "1"
[20,] "4" "site5" "0"
[21,] "5" "site1" "0"
[22,] "5" "site2" "1"
[23,] "5" "site3" "0"
[24,] "5" "site4" "0"
[25,] "5" "site5" "0"
[26,] "6" "site1" "0"
[27,] "6" "site2" "0"
[28,] "6" "site3" "0"
[29,] "6" "site4" "1"
[30,] "6" "site5" "0"
[31,] "7" "site1" "0"
[32,] "7" "site2" "0"
[33,] "7" "site3" "1"
[34,] "7" "site4" "0"
[35,] "7" "site5" "0"
[36,] "8" "site1" "0"
[37,] "8" "site2" "0"
[38,] "8" "site3" "0"
[39,] "8" "site4" "0"
[40,] "8" "site5" "1"
[41,] "9" "site1" "0"
[42,] "9" "site2" "1"
[43,] "9" "site3" "0"
[44,] "9" "site4" "0"
[45,] "9" "site5" "0"
[46,] "10" "site1" "1"
[47,] "10" "site2" "0"
[48,] "10" "site3" "0"
[49,] "10" "site4" "0"
[50,] "10" "site5" "0"
这个我试了很多次都搞不定,网上也找不到相关的答案。
提前致谢:)
我们可以split
通过'id'得到'site',然后用Map
与'choice'中对应的值进行比较得到一个逻辑索引。
df$indicator <- +(unlist(Map(`==`, split(df$site, df$id), choice), use.names=FALSE))
或用tabulate
得到'id'的频率,复制'choice',与'site'比较并转换为二进制
+(rep(choice,tabulate(df$id))==df$site)
数据
df <- data.frame(id, site)
正如 Akrun 所建议的,使用 data.frame
来定义 df:
df <- data.frame(id, site)
然后做:
df$indicator <- (df$site == choice[df$id])*1
*1
会将 T/F 结果转换为 1 和 0
结果:
id site indicator
1 1 site1 0
2 1 site2 0
3 1 site3 1
4 1 site4 0
5 1 site5 0
6 2 site1 0
7 2 site2 0
8 2 site3 0
9 2 site4 0
10 2 site5 1
11 3 site1 1
12 3 site2 0
13 3 site3 0
14 3 site4 0
15 3 site5 0
16 4 site1 0
17 4 site2 0
18 4 site3 0
19 4 site4 1
20 4 site5 0
21 5 site1 0
22 5 site2 1
23 5 site3 0
24 5 site4 0
25 5 site5 0
26 6 site1 0
27 6 site2 0
28 6 site3 0
29 6 site4 1
30 6 site5 0
31 7 site1 0
32 7 site2 0
33 7 site3 1
34 7 site4 0
35 7 site5 0
36 8 site1 0
37 8 site2 0
38 8 site3 0
39 8 site4 0
40 8 site5 1
41 9 site1 0
42 9 site2 1
43 9 site3 0
44 9 site4 0
45 9 site5 0
46 10 site1 1
47 10 site2 0
48 10 site3 0
49 10 site4 0
50 10 site5 0
如果您想要字符串而不是数字或因子,请在要转换的列上使用 as.character
使用给定矩阵 (df),指标 可以计算为:
indicator <- as.numeric(choice[as.numeric(df[,"id"])] == df[,"site"])
# Final matrix
df <- cbind(df,indicator)
所需矩阵:
id site indicator
[1,] "1" "site1" "0"
[2,] "1" "site2" "0"
[3,] "1" "site3" "1"
[4,] "1" "site4" "0"
[5,] "1" "site5" "0"
[6,] "2" "site1" "0"
[7,] "2" "site2" "0"
[8,] "2" "site3" "0"
[9,] "2" "site4" "0"
[10,] "2" "site5" "1"
[11,] "3" "site1" "1"
[12,] "3" "site2" "0"
[13,] "3" "site3" "0"
[14,] "3" "site4" "0"
[15,] "3" "site5" "0"
[16,] "4" "site1" "0"
[17,] "4" "site2" "0"
[18,] "4" "site3" "0"
[19,] "4" "site4" "1"
[20,] "4" "site5" "0"
[21,] "5" "site1" "0"
[22,] "5" "site2" "1"
[23,] "5" "site3" "0"
[24,] "5" "site4" "0"
[25,] "5" "site5" "0"
[26,] "6" "site1" "0"
[27,] "6" "site2" "0"
[28,] "6" "site3" "0"
[29,] "6" "site4" "1"
[30,] "6" "site5" "0"
[31,] "7" "site1" "0"
[32,] "7" "site2" "0"
[33,] "7" "site3" "1"
[34,] "7" "site4" "0"
[35,] "7" "site5" "0"
[36,] "8" "site1" "0"
[37,] "8" "site2" "0"
[38,] "8" "site3" "0"
[39,] "8" "site4" "0"
[40,] "8" "site5" "1"
[41,] "9" "site1" "0"
[42,] "9" "site2" "1"
[43,] "9" "site3" "0"
[44,] "9" "site4" "0"
[45,] "9" "site5" "0"
[46,] "10" "site1" "1"
[47,] "10" "site2" "0"
[48,] "10" "site3" "0"
[49,] "10" "site4" "0"
[50,] "10" "site5" "0"