java 中的置信区间，测试对象列表中元素的随机选择

Question

所以我有这个方法可以从 2 个对象的列表中随机选择一个对象。我想编写一个 junit 测试 (@Test)，根据置信度断言这两个对象中的每一个都有 50% 的机会被选中。

被测代码：

public MySepecialObj pickTheValue(List<MySepecialObj> objs, Random shufflingFactor) {

    // this could probably be done in a more efficient way
    // but my point is asserting on the 50% chance of the 
    // two objects inside the input list
    Collections.shuffle(objs, shufflingFactor);
    return objs.get(0);
}

在测试中，我想提供 2 个模拟（firstMySepecialObjMock 和 secondMySepecialObjMock）作为 MySepecialObj 类型的输入对象，并提供 new Random() 作为输入混洗参数，然后断言 firstMySepecialObjMock 恰好是 50% 的选择，而 secondMySepecialObjMock 恰好是另外 50% 的选择。

类似于：

@Test
public void myTestShouldCheckTheConfidenceInterval() {

    // using Mockito here
    MySepecialObj firstMySepecialObjMock = mock(MySepecialObj.class);
    MySepecialObj secondMySepecialObjMock = mock(MySepecialObj.class);

    // using some helpers from Guava to build the input list
    List<MySepecialObj> theListOfTwoElements = Lists.newArrayList(firstMySepecialObjMock, secondMySepecialObjMock);

    // call the method (multiple times? how many?) like:
    MySepecialObj chosenValue = pickTheValue(theListOfTwoElements, new Random());

    // assert somehow on all the choices using a confidence level
    // verifying that firstMySepecialObjMock was picked ~50% of the times
    // and secondMySepecialObjMock was picked the other ~50% of the times
}

我不确定这里的统计理论，所以也许我应该为它的构造函数提供一个具有不同参数的 Random 的不同实例？

我还想做一个测试，我可以将置信度水平设置为参数（我猜通常是 95%，但也可以是其他值？）。

什么可能是涉及置信度参数的纯 java solution/setup 测试？
什么可以与涉及 Apache Commons 等辅助库的测试等效solution/setup？

Answer 1

首先，这是从 Java 中的列表中选取随机元素的常规方法。（nextInt(objs.size() 产生 0 到 objs.size() 之间的随机整数）。
```
public MySepecialObj pickTheValue(List<MySepecialObj> objs, Random random) {
    int i = random.nextInt(objs.size());
    return objs.get(i);
}
```
您可以阅读 in Wikipedia 了解在给定的置信水平下您应该进行多少次具有 2 种可能结果的实验。例如。对于 95% 的置信水平，您得到的置信区间为 1.9599。您还需要提供最大误差，例如 0.01。那么进行实验的次数：
```
double confidenceInterval = 1.9599;
double maxError = 0.01;
int numberOfPicks = (int) (Math.pow(confidenceInterval, 2)/(4*Math.pow(maxError, 2)));
```
结果是 numberOfPicks = 9603。那是你应该调用多少次 pickTheValue.

这就是我建议您多次执行实验的方式（请注意 random 正在重复使用）：

Random random = new Random();
double timesFirstWasPicked = 0;
double timesSecondWasPicked = 0;
for (int i = 0; i < numberOfPicks; ++i) {
    MySepecialObj chosenValue = pickTheValue(theListOfTwoElements, random);
    if (chosenValue == firstMySepecialObjMock) {
        ++timesFirstWasPicked;
    } else {
        ++timesSecondWasPicked;
    }
}
double probabilityFirst = timesFirstWasPicked / numberOfPicks;
double probabilitySecond = timesSecondWasPicked / numberOfPicks;

然后断言probabilityFirst、probabilitySecond距离0.5

maxError

我在 apache-commons-math 中找到了 BinomialTest class，但我看不出它对您的情况有何帮助。它可以根据实验次数计算置信度。你想要相反的东西。

java 中的置信区间，测试对象列表中元素的随机选择

Confidence intervals in java, testing the random pick of an element in a list of objects

java

random

confidence-interval

apache-commons-math