尝试将数据帧随机化两次并将这两个样本添加到新数据帧

Question

所以我有一个数据框：

> MLSpredictions
        fit    se.fit residual.scale      upr      lwr
1  1.392213 0.1476321              1 1.681572 1.102854
2  1.448370 0.1709856              1 1.783501 1.113238
3  1.392213 0.1476321              1 1.681572 1.102854
4  1.448370 0.1709856              1 1.783501 1.113238
5  1.448370 0.1709856              1 1.783501 1.113238
6  1.448370 0.1709856              1 1.783501 1.113238
7  1.506792 0.1969097              1 1.892734 1.120849
8  1.506792 0.1969097              1 1.892734 1.120849
9  1.567570 0.2253572              1 2.009270 1.125870
10 1.567570 0.2253572              1 2.009270 1.125870
11 1.630800 0.2563338              1 2.133214 1.128386
12 1.448370 0.1709856              1 1.783501 1.113238
13 1.448370 0.1709856              1 1.783501 1.113238
14 1.448370 0.1709856              1 1.783501 1.113238
15 1.506792 0.1969097              1 1.892734 1.120849
16 1.567570 0.2253572              1 2.009270 1.125870
17 1.567570 0.2253572              1 2.009270 1.125870
18 1.567570 0.2253572              1 2.009270 1.125870
19 1.567570 0.2253572              1 2.009270 1.125870

我想对整个数据帧进行两次采样，并将这两个样本添加到新数据帧 MLSSeason:

我的尝试是：

MLSSeason[1:19] = sample(MLSpredictions)
MLSSeason[20:38] = sample(MLSpredictions)

但这并没有给我正确的解决方案。理想情况下，MLSSeason 将有 38 行，每个 MLSprediction 中有两行在内部采样。

Answer 1

您无法将数据框提供给 sample。它不会给您任何错误，但数据框会原样返回。相反，您应该生成行索引。

MLSSeason <- MLSpredictions[c(sample(nrow(MLSpredictions)), sample(nrow(MLSpredictions))), ]

注意，这不等同于：

MLSpredictions[samp‌le(nrow(MLSprediction‌s)),]

不能有重复的行。

Answer 2

如果您提供数据框进行采样，它将对数据框的列进行采样，而不是行。

以下代码将对每行进行两次采样，让您知道哪些行是第一次或第二次采样。

MLSprediction‌s[sample(1:(nrow(MLSprediction‌s)*2))/2,]

它将为您提供信息行名称，例如，其中 11.1 是行 11.

的第二次出现

          fit    se.fit residual.scale      upr      lwr
16   1.567570 0.2253572              1 2.009270 1.125870
5    1.448370 0.1709856              1 1.783501 1.113238
11   1.630800 0.2563338              1 2.133214 1.128386
15   1.506792 0.1969097              1 1.892734 1.120849
1    1.392213 0.1476321              1 1.681572 1.102854
12   1.448370 0.1709856              1 1.783501 1.113238
11.1 1.630800 0.2563338              1 2.133214 1.128386
7    1.506792 0.1969097              1 1.892734 1.120849

如果你希望样本分块，比如保证每行每19行采样一次，那么@ZheyuanLi提供了理想的答案。如果没有，我的答案可能最适合你。

尝试将数据帧随机化两次并将这两个样本添加到新数据帧

Attempting to randomize a data frame twice and add both of those samples to a new data frame

r

sampling

dataframe