将虚拟列添加到标志中,因为该行是否被随机选择

Add dummy column to flag as the row is randomly selected or not

假设我有以下数据集(名为data)。

id var1 var2
1   A   33
2   B   23
3   A   45
4   A   55
5   B   22
6   A   33
7   B   90
8   A   78
9   B   12
10  A   11

我的意图是在原始数据集中添加一个新列来指示数据集的每一行是否是随机选择的(1/0)。我尝试了以下方法。

library(sampling)
data1 <- strata(data,"var1", size=c(4,3),method="srswor") #stratified random sampling
data2 <- getdata(data,data1)  # this gives a separate data set

有什么帮助吗?谢谢!

如果您查看 sampling::strata() 的文档,您会发现以下信息:

The function produces an object, which contains the following information:

ID_unit 
the identifier of the selected units.

Stratum 
the unit stratum.

Prob    
the unit inclusion probability.

ID_Unit 可用于对原始数据进行子集化并分配您要求的布尔值:

data<-structure(list(id=c(1,2,3,4,5,6,7,8,9,10),var1=c("A",
"B","A","A","B","A","B","A","B","A"),var2=c(33,23,
45,55,22,33,90,78,12,11)),row.names=c(NA,-10L),class=c("tbl_df",
"tbl","data.frame"))


library(sampling)
data1 <- strata(data,"var1", size=c(4,3),method="srswor") #stratified random sampling
data2 <- getdata(data,data1)  # this gives a separate data set

data$sampled <- FALSE
data[data1$ID_unit, "sampled"] <- TRUE                 
data
#>    id var1 var2 sampled
#> 1   1    A   33   FALSE
#> 2   2    B   23    TRUE
#> 3   3    A   45   FALSE
#> 4   4    A   55    TRUE
#> 5   5    B   22   FALSE
#> 6   6    A   33    TRUE
#> 7   7    B   90    TRUE
#> 8   8    A   78    TRUE
#> 9   9    B   12    TRUE
#> 10 10    A   11    TRUE

reprex package (v0.3.0)

于 2020-07-28 创建