R - 配对数据中的样本
R - sample within paired data
我试图在配对数据中随机抽样一个变量。
idmen
是我的配对 identifier,idind
是我的 perso identifier 和 jour
是变量需要随机子集。 jour
一对 idmen
必须相同。因此,例如,idmen == 2
,我们需要对它们的 dimanche
或 vendredi
进行子集化。
这是数据
idmen idind jour actpr1
1 1 lundi 111
1 2 lundi 111
2 1 dimanche 111
2 2 dimanche 111
2 1 vendredi 111
2 2 vendredi 111
3 1 dimanche 113
3 2 dimanche 121
3 1 lundi 111
3 2 lundi 111
这是期望的输出
(当然输出可能会有所不同,因为它必须是随机选择的)
我需要为每个 idmen
采样一天。
idmen idind jour actpr1
1 1 lundi 111
1 2 lundi 111
2 1 dimanche 111
2 2 dimanche 111
3 1 dimanche 113
3 2 dimanche 121
我想到了
library(dplyr)
dta %>% group_by(idmen, jour) %>% sample_n(2)
但我不明白为什么这不起作用。
有线索吗?
structure(list(idmen = c(1, 1, 2, 2, 2, 2, 3, 3, 3, 3), idind = c(1,
2, 1, 2, 1, 2, 1, 2, 1, 2), jour = structure(c(3L, 3L, 1L, 1L,
7L, 7L, 1L, 1L, 3L, 3L), .Label = c("dimanche", "jeudi ", "lundi ",
"mardi ", "mercredi", "samedi ", "vendredi"), class = "factor"),
actpr1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 1L,
1L), .Label = c("111", "112", "113", "121", "122", "123",
"131", "132", "141", "143", "144", "145", "146", "151", "211",
"212", "213", "223", "231", "233", "241", "261", "262", "271",
"272", "311", "312", "313", "324", "331", "332", "334", "335",
"341", "342", "343", "351", "372", "373", "374", "381", "382",
"384", "385", "399", "411", "412", "413", "414", "419", "422",
"423", "429", "431", "433", "510", "511", "512", "513", "514",
"521", "522", "523", "524", "531", "532", "533", "541", "542",
"613", "614", "616", "621", "622", "623", "627", "631", "632",
"633", "634", "635", "636", "637", "638", "641", "651", "653",
"655", "658", "661", "662", "663", "665", "667", "668", "669",
"671", "672", "673", "674", "678", "810", "811", "812", "813",
"819", "911", "999"), class = "factor")), .Names = c("idmen",
"idind", "jour", "actpr1"), row.names = c(NA, -10L), class = "data.frame")
也许试试这个:
> dta %>% group_by(idmen) %>% filter(jour == jour[sample(length(jour),1)])
Source: local data frame [6 x 4]
Groups: idmen [3]
idmen idind jour actpr1
(dbl) (dbl) (fctr) (fctr)
1 1 1 lundi 111
2 1 2 lundi 111
3 2 1 vendredi 111
4 2 2 vendredi 111
5 3 1 lundi 111
6 3 2 lundi 111
...虽然在 dplyr 中内置一个 "sample complete groups" 函数可能会很整洁。
这是一个 Base R 解决方案:
dta[unlist(sample(as.data.frame(matrix(1:nrow(dta),nrow = 2)),10,replace=T)),]
这利用了数据框是列表这一事实。当您在列表上使用 sample()
时,它将占用数据框的一整列。然后只需在结果上使用 unlist()
就可以同时对两行进行采样。这对 10 对进行了替换,但当然可以更改。
我试图在配对数据中随机抽样一个变量。
idmen
是我的配对 identifier,idind
是我的 perso identifier 和 jour
是变量需要随机子集。 jour
一对 idmen
必须相同。因此,例如,idmen == 2
,我们需要对它们的 dimanche
或 vendredi
进行子集化。
这是数据
idmen idind jour actpr1
1 1 lundi 111
1 2 lundi 111
2 1 dimanche 111
2 2 dimanche 111
2 1 vendredi 111
2 2 vendredi 111
3 1 dimanche 113
3 2 dimanche 121
3 1 lundi 111
3 2 lundi 111
这是期望的输出 (当然输出可能会有所不同,因为它必须是随机选择的)
我需要为每个 idmen
采样一天。
idmen idind jour actpr1
1 1 lundi 111
1 2 lundi 111
2 1 dimanche 111
2 2 dimanche 111
3 1 dimanche 113
3 2 dimanche 121
我想到了
library(dplyr)
dta %>% group_by(idmen, jour) %>% sample_n(2)
但我不明白为什么这不起作用。
有线索吗?
structure(list(idmen = c(1, 1, 2, 2, 2, 2, 3, 3, 3, 3), idind = c(1,
2, 1, 2, 1, 2, 1, 2, 1, 2), jour = structure(c(3L, 3L, 1L, 1L,
7L, 7L, 1L, 1L, 3L, 3L), .Label = c("dimanche", "jeudi ", "lundi ",
"mardi ", "mercredi", "samedi ", "vendredi"), class = "factor"),
actpr1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 3L, 4L, 1L,
1L), .Label = c("111", "112", "113", "121", "122", "123",
"131", "132", "141", "143", "144", "145", "146", "151", "211",
"212", "213", "223", "231", "233", "241", "261", "262", "271",
"272", "311", "312", "313", "324", "331", "332", "334", "335",
"341", "342", "343", "351", "372", "373", "374", "381", "382",
"384", "385", "399", "411", "412", "413", "414", "419", "422",
"423", "429", "431", "433", "510", "511", "512", "513", "514",
"521", "522", "523", "524", "531", "532", "533", "541", "542",
"613", "614", "616", "621", "622", "623", "627", "631", "632",
"633", "634", "635", "636", "637", "638", "641", "651", "653",
"655", "658", "661", "662", "663", "665", "667", "668", "669",
"671", "672", "673", "674", "678", "810", "811", "812", "813",
"819", "911", "999"), class = "factor")), .Names = c("idmen",
"idind", "jour", "actpr1"), row.names = c(NA, -10L), class = "data.frame")
也许试试这个:
> dta %>% group_by(idmen) %>% filter(jour == jour[sample(length(jour),1)])
Source: local data frame [6 x 4]
Groups: idmen [3]
idmen idind jour actpr1
(dbl) (dbl) (fctr) (fctr)
1 1 1 lundi 111
2 1 2 lundi 111
3 2 1 vendredi 111
4 2 2 vendredi 111
5 3 1 lundi 111
6 3 2 lundi 111
...虽然在 dplyr 中内置一个 "sample complete groups" 函数可能会很整洁。
这是一个 Base R 解决方案:
dta[unlist(sample(as.data.frame(matrix(1:nrow(dta),nrow = 2)),10,replace=T)),]
这利用了数据框是列表这一事实。当您在列表上使用 sample()
时,它将占用数据框的一整列。然后只需在结果上使用 unlist()
就可以同时对两行进行采样。这对 10 对进行了替换,但当然可以更改。