Java 中数据集的规范化
Normalization of a dataset in Java
我正在研究聚类程序,并且有一个双精度数据集,我需要对其进行归一化以确保每个双精度(变量)具有相同的影响。
我想使用最小-最大归一化,其中每个变量的最小值和最大值都已确定,但我不确定如何在 Java 中的数据集上实现它。有人有什么建议吗?
做范围归一化的Encog Project wiki gives a utility class。
构造函数采用输入和规范化数据的高值和低值。
/**
* Construct the normalization utility, allow the normalization range to be specified.
* @param dataHigh The high value for the input data.
* @param dataLow The low value for the input data.
* @param dataHigh The high value for the normalized data.
* @param dataLow The low value for the normalized data.
*/
public NormUtil(double dataHigh, double dataLow, double normalizedHigh, double normalizedLow) {
this.dataHigh = dataHigh;
this.dataLow = dataLow;
this.normalizedHigh = normalizedHigh;
this.normalizedLow = normalizedLow;
然后您可以对样本使用 normalize
方法。
/**
* Normalize x.
* @param x The value to be normalized.
* @return The result of the normalization.
*/
public double normalize(double x) {
return ((x - dataLow)
/ (dataHigh - dataLow))
* (normalizedHigh - normalizedLow) + normalizedLow;
}
要找到数据集的最小值和最大值,请使用此问题的一个答案:Finding the max/min value in an array of primitives using Java。
您可以很好地使用 StatUtils.normalize 方法 apache.commons.math3 库
Gradle依赖如下
implementation 'org.apache.commons:commons-math3:3.6.1'
Maven 依赖
<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-math3 -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.6.1</version>
</dependency>
例子
public static void main(String[] args) {
double[] arr = new double[]{900.68, 900.63, 900.74, 900.59, 900.49, 900.65, 900.81, 900.82, 901.03, 900.74, 900.66, 900.49, 900.52, 900.63, 900.45};
double normArr[] = StatUtils.normalize(arr);
for (int i = 0; i < normArr.length; i++) {
System.out.print(normArr[i] + ", ");
}
}
This would print out the values : 0.11787856446848383, -0.20956189238965656, 0.5108071126989968, -0.47151425787616885, -1.1263951715931941, -0.0785857096464004, 0.9692237523003934, 1.034711843672766, 2.4099617624777, 0.5108071126989968, -0.013097618274772323, -1.1263951715931941, -0.9299308974783099, -0.20956189238965656, -1.3883475370797065
我正在研究聚类程序,并且有一个双精度数据集,我需要对其进行归一化以确保每个双精度(变量)具有相同的影响。
我想使用最小-最大归一化,其中每个变量的最小值和最大值都已确定,但我不确定如何在 Java 中的数据集上实现它。有人有什么建议吗?
做范围归一化的Encog Project wiki gives a utility class。
构造函数采用输入和规范化数据的高值和低值。
/**
* Construct the normalization utility, allow the normalization range to be specified.
* @param dataHigh The high value for the input data.
* @param dataLow The low value for the input data.
* @param dataHigh The high value for the normalized data.
* @param dataLow The low value for the normalized data.
*/
public NormUtil(double dataHigh, double dataLow, double normalizedHigh, double normalizedLow) {
this.dataHigh = dataHigh;
this.dataLow = dataLow;
this.normalizedHigh = normalizedHigh;
this.normalizedLow = normalizedLow;
然后您可以对样本使用 normalize
方法。
/**
* Normalize x.
* @param x The value to be normalized.
* @return The result of the normalization.
*/
public double normalize(double x) {
return ((x - dataLow)
/ (dataHigh - dataLow))
* (normalizedHigh - normalizedLow) + normalizedLow;
}
要找到数据集的最小值和最大值,请使用此问题的一个答案:Finding the max/min value in an array of primitives using Java。
您可以很好地使用 StatUtils.normalize 方法 apache.commons.math3 库
Gradle依赖如下
implementation 'org.apache.commons:commons-math3:3.6.1'
Maven 依赖
<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-math3 -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-math3</artifactId>
<version>3.6.1</version>
</dependency>
例子
public static void main(String[] args) {
double[] arr = new double[]{900.68, 900.63, 900.74, 900.59, 900.49, 900.65, 900.81, 900.82, 901.03, 900.74, 900.66, 900.49, 900.52, 900.63, 900.45};
double normArr[] = StatUtils.normalize(arr);
for (int i = 0; i < normArr.length; i++) {
System.out.print(normArr[i] + ", ");
}
}
This would print out the values : 0.11787856446848383, -0.20956189238965656, 0.5108071126989968, -0.47151425787616885, -1.1263951715931941, -0.0785857096464004, 0.9692237523003934, 1.034711843672766, 2.4099617624777, 0.5108071126989968, -0.013097618274772323, -1.1263951715931941, -0.9299308974783099, -0.20956189238965656, -1.3883475370797065