Java 流 - 标准差
Java Streams - Standard Deviation
我想先澄清一下,我正在寻找一种使用 Streams 计算标准偏差的方法(我目前有一种工作方法可以计算 & returns SD 但不使用 Streams)。
我正在使用的数据集与 Link 中所见非常匹配。如此所示 link 我能够对我的数据进行分组并获得平均值,但无法弄清楚如何获得 SD。
代码
outPut.stream()
.collect(Collectors.groupingBy(e -> e.getCar(),
Collectors.averagingDouble(e -> (e.getHigh() - e.getLow()))))
.forEach((car,avgHLDifference) -> System.out.println(car+ "\t" + avgHLDifference));
我还在 DoubleSummaryStatistics 上检查了 Link,但它似乎对 SD 没有帮助。
您可以为此任务使用自定义收集器来计算平方和。内置 DoubleSummaryStatistics
收集器不会跟踪它。专家组in this thread对此进行了讨论,但最终没有实施。计算平方和的难点在于对中间结果求平方时可能会溢出。
static class DoubleStatistics extends DoubleSummaryStatistics {
private double sumOfSquare = 0.0d;
private double sumOfSquareCompensation; // Low order bits of sum
private double simpleSumOfSquare; // Used to compute right sum for non-finite inputs
@Override
public void accept(double value) {
super.accept(value);
double squareValue = value * value;
simpleSumOfSquare += squareValue;
sumOfSquareWithCompensation(squareValue);
}
public DoubleStatistics combine(DoubleStatistics other) {
super.combine(other);
simpleSumOfSquare += other.simpleSumOfSquare;
sumOfSquareWithCompensation(other.sumOfSquare);
sumOfSquareWithCompensation(other.sumOfSquareCompensation);
return this;
}
private void sumOfSquareWithCompensation(double value) {
double tmp = value - sumOfSquareCompensation;
double velvel = sumOfSquare + tmp; // Little wolf of rounding error
sumOfSquareCompensation = (velvel - sumOfSquare) - tmp;
sumOfSquare = velvel;
}
public double getSumOfSquare() {
double tmp = sumOfSquare + sumOfSquareCompensation;
if (Double.isNaN(tmp) && Double.isInfinite(simpleSumOfSquare)) {
return simpleSumOfSquare;
}
return tmp;
}
public final double getStandardDeviation() {
return getCount() > 0 ? Math.sqrt((getSumOfSquare() / getCount()) - Math.pow(getAverage(), 2)) : 0.0d;
}
}
然后,您可以将此 class 与
一起使用
Map<String, Double> standardDeviationMap =
list.stream()
.collect(Collectors.groupingBy(
e -> e.getCar(),
Collectors.mapping(
e -> e.getHigh() - e.getLow(),
Collector.of(
DoubleStatistics::new,
DoubleStatistics::accept,
DoubleStatistics::combine,
d -> d.getStandardDeviation()
)
)
));
这会将输入列表收集到映射中,其中值对应于同一键的 high - low
标准差。
您可以使用这个自定义收集器:
private static final Collector<Double, double[], Double> VARIANCE_COLLECTOR = Collector.of( // See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
() -> new double[3], // {count, mean, M2}
(acu, d) -> { // See chapter about Welford's online algorithm and https://math.stackexchange.com/questions/198336/how-to-calculate-standard-deviation-with-streaming-inputs
acu[0]++; // Count
double delta = d - acu[1];
acu[1] += delta / acu[0]; // Mean
acu[2] += delta * (d - acu[1]); // M2
},
(acuA, acuB) -> { // See chapter about "Parallel algorithm" : only called if stream is parallel ...
double delta = acuB[1] - acuA[1];
double count = acuA[0] + acuB[0];
acuA[2] = acuA[2] + acuB[2] + delta * delta * acuA[0] * acuB[0] / count; // M2
acuA[1] += delta * acuB[0] / count; // Mean
acuA[0] = count; // Count
return acuA;
},
acu -> acu[2] / (acu[0] - 1.0), // Var = M2 / (count - 1)
UNORDERED);
然后只需在您的流中调用此收集器即可:
double stdDev = Math.sqrt(outPut.stream().boxed().collect(VARIANCE_COLLECTOR));
我想先澄清一下,我正在寻找一种使用 Streams 计算标准偏差的方法(我目前有一种工作方法可以计算 & returns SD 但不使用 Streams)。
我正在使用的数据集与 Link 中所见非常匹配。如此所示 link 我能够对我的数据进行分组并获得平均值,但无法弄清楚如何获得 SD。
代码
outPut.stream()
.collect(Collectors.groupingBy(e -> e.getCar(),
Collectors.averagingDouble(e -> (e.getHigh() - e.getLow()))))
.forEach((car,avgHLDifference) -> System.out.println(car+ "\t" + avgHLDifference));
我还在 DoubleSummaryStatistics 上检查了 Link,但它似乎对 SD 没有帮助。
您可以为此任务使用自定义收集器来计算平方和。内置 DoubleSummaryStatistics
收集器不会跟踪它。专家组in this thread对此进行了讨论,但最终没有实施。计算平方和的难点在于对中间结果求平方时可能会溢出。
static class DoubleStatistics extends DoubleSummaryStatistics {
private double sumOfSquare = 0.0d;
private double sumOfSquareCompensation; // Low order bits of sum
private double simpleSumOfSquare; // Used to compute right sum for non-finite inputs
@Override
public void accept(double value) {
super.accept(value);
double squareValue = value * value;
simpleSumOfSquare += squareValue;
sumOfSquareWithCompensation(squareValue);
}
public DoubleStatistics combine(DoubleStatistics other) {
super.combine(other);
simpleSumOfSquare += other.simpleSumOfSquare;
sumOfSquareWithCompensation(other.sumOfSquare);
sumOfSquareWithCompensation(other.sumOfSquareCompensation);
return this;
}
private void sumOfSquareWithCompensation(double value) {
double tmp = value - sumOfSquareCompensation;
double velvel = sumOfSquare + tmp; // Little wolf of rounding error
sumOfSquareCompensation = (velvel - sumOfSquare) - tmp;
sumOfSquare = velvel;
}
public double getSumOfSquare() {
double tmp = sumOfSquare + sumOfSquareCompensation;
if (Double.isNaN(tmp) && Double.isInfinite(simpleSumOfSquare)) {
return simpleSumOfSquare;
}
return tmp;
}
public final double getStandardDeviation() {
return getCount() > 0 ? Math.sqrt((getSumOfSquare() / getCount()) - Math.pow(getAverage(), 2)) : 0.0d;
}
}
然后,您可以将此 class 与
一起使用Map<String, Double> standardDeviationMap =
list.stream()
.collect(Collectors.groupingBy(
e -> e.getCar(),
Collectors.mapping(
e -> e.getHigh() - e.getLow(),
Collector.of(
DoubleStatistics::new,
DoubleStatistics::accept,
DoubleStatistics::combine,
d -> d.getStandardDeviation()
)
)
));
这会将输入列表收集到映射中,其中值对应于同一键的 high - low
标准差。
您可以使用这个自定义收集器:
private static final Collector<Double, double[], Double> VARIANCE_COLLECTOR = Collector.of( // See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
() -> new double[3], // {count, mean, M2}
(acu, d) -> { // See chapter about Welford's online algorithm and https://math.stackexchange.com/questions/198336/how-to-calculate-standard-deviation-with-streaming-inputs
acu[0]++; // Count
double delta = d - acu[1];
acu[1] += delta / acu[0]; // Mean
acu[2] += delta * (d - acu[1]); // M2
},
(acuA, acuB) -> { // See chapter about "Parallel algorithm" : only called if stream is parallel ...
double delta = acuB[1] - acuA[1];
double count = acuA[0] + acuB[0];
acuA[2] = acuA[2] + acuB[2] + delta * delta * acuA[0] * acuB[0] / count; // M2
acuA[1] += delta * acuB[0] / count; // Mean
acuA[0] = count; // Count
return acuA;
},
acu -> acu[2] / (acu[0] - 1.0), // Var = M2 / (count - 1)
UNORDERED);
然后只需在您的流中调用此收集器即可:
double stdDev = Math.sqrt(outPut.stream().boxed().collect(VARIANCE_COLLECTOR));