有没有办法规范化具有高峰度的数据?
Is there a way to normalize data with high kurtosis?
我有一个峰度为 2.95 的向量(相当高,Leptokurtic)。以下是该数据的示例:
x = c(6.819, 8.948, 0, 67.556, -40.785, -18.951, -29.151, 1.008,
0, 18.034, -6.631, 6.294, 0.643, -28.921, 0, -2.133, -44.348,
-87.488, 7.063, 0, -74.428, -16.361, 50.963, -32.431, -82.233,
-26.953, -48.475, 64.043, 0, 1.576, -2.728, -5.9, -63.059, -1.061,
-15.018, -58.119, -32.092, 5.329, -19.968, 38.822, 66.897, 0,
-2.579, 82.696, 42.745, 79.677, 2.522, -11.475, 1.019, 2.719,
-3.634, -7.975, 0, 1.873, 21.732, -10.217, -24.002, -76.049,
35.045, 27.22, -71.366, 16.293, -48.762, 65.481, 66.615, -19.616,
6.016, 59.722, 88.235, 10.1, 0, -4.598, 5.446, 56.909, 0, -24.827,
0, 6.487, 0, 63.315, 28.397, 9.433, 19.085, 0, 6.591, -22.643,
32.235, -12.535, -1.787, 56.157, 68.819, 0, -21.936, 38.695,
-79.006, 24.888, -5.187, 10.368, -68.191, 0, -22.171, -78.783,
-14.119, 54.084, -13.597, 26.669, 0, -18.402, 80.309, -12.652,
1.801, -69.946, -87.67, -19.586, 38.085, -21.031, -36.957, 1.357,
0.17, 47.407, -59.598, 66.125, 10.97, 6.33, -38.837, 1.868, 38.169,
-46.662, -32.255, 25.816, 14.432, -18.57, -0.456, -0.638, 31.07,
72.794, 52.957, 13.858, -18.885, 0, -13.488, 11.689, 1.618, 19.373,
-57.526, 0, -0.655, 36.308, 50.231, 0.048, -80.157, 0, -64.805,
-70.864, 0.813, 52.143, -4.989, 42.166, 7.397, 87.437, -17.897,
-0.877, 68.363, 47.315, -2.181, 2.699, 36.278, 0, -2.924, 71.56,
74.406, -46.071, 56.158, 1.44, 0, 0, 0, -3.233, 37.084, -85.189,
0, -16.137, -84.499, -12.67, -14.117, 0, 23.757, -58.299, -34.956,
0.402, 0, -67.585, -14.314, -73.426, 23.158, 1.782, 0, 4.399,
18.871, -6.929)
- 有没有办法规范化这些数据?
- 由于这个数据范围在 -90 到 90 之间,归一化数据应该在类似的范围内并且不应该变化很大,即范围不应该更改为 -1 到 1 或 -20 到 20 等。 .
我尝试过使用 atan(X)
、1/x
、log(x)
和许多其他转换技术,但它们都倾向于增加偏度。有没有一种方法可以在不扭曲数据的情况下对其进行标准化?
我相信一定有一个简单的解决方案。
它可能不是你想要的,但你几乎总是可以完美地使用正态分数转换[=]标准化分布(如果没有关系) 22=]:
xq <- qnorm(rank(x)/(length(x)+1), mean=mean(x), sd=sd(x))
plot(sort(x),sort(xq))
hist(xq)
qqnorm(xq)
新范围是 (-99.2, 99.6)(旧范围是 +/- 88)。
如果您需要更改范围,您可以按如下方式进行:
newmin + (newmax-newmin)*scale(xq, center=min(qx), scale=diff(range(xq)))
但正如评论中所建议的那样,这实际上可能不是解决更广泛问题的正确方法。
我有一个峰度为 2.95 的向量(相当高,Leptokurtic)。以下是该数据的示例:
x = c(6.819, 8.948, 0, 67.556, -40.785, -18.951, -29.151, 1.008,
0, 18.034, -6.631, 6.294, 0.643, -28.921, 0, -2.133, -44.348,
-87.488, 7.063, 0, -74.428, -16.361, 50.963, -32.431, -82.233,
-26.953, -48.475, 64.043, 0, 1.576, -2.728, -5.9, -63.059, -1.061,
-15.018, -58.119, -32.092, 5.329, -19.968, 38.822, 66.897, 0,
-2.579, 82.696, 42.745, 79.677, 2.522, -11.475, 1.019, 2.719,
-3.634, -7.975, 0, 1.873, 21.732, -10.217, -24.002, -76.049,
35.045, 27.22, -71.366, 16.293, -48.762, 65.481, 66.615, -19.616,
6.016, 59.722, 88.235, 10.1, 0, -4.598, 5.446, 56.909, 0, -24.827,
0, 6.487, 0, 63.315, 28.397, 9.433, 19.085, 0, 6.591, -22.643,
32.235, -12.535, -1.787, 56.157, 68.819, 0, -21.936, 38.695,
-79.006, 24.888, -5.187, 10.368, -68.191, 0, -22.171, -78.783,
-14.119, 54.084, -13.597, 26.669, 0, -18.402, 80.309, -12.652,
1.801, -69.946, -87.67, -19.586, 38.085, -21.031, -36.957, 1.357,
0.17, 47.407, -59.598, 66.125, 10.97, 6.33, -38.837, 1.868, 38.169,
-46.662, -32.255, 25.816, 14.432, -18.57, -0.456, -0.638, 31.07,
72.794, 52.957, 13.858, -18.885, 0, -13.488, 11.689, 1.618, 19.373,
-57.526, 0, -0.655, 36.308, 50.231, 0.048, -80.157, 0, -64.805,
-70.864, 0.813, 52.143, -4.989, 42.166, 7.397, 87.437, -17.897,
-0.877, 68.363, 47.315, -2.181, 2.699, 36.278, 0, -2.924, 71.56,
74.406, -46.071, 56.158, 1.44, 0, 0, 0, -3.233, 37.084, -85.189,
0, -16.137, -84.499, -12.67, -14.117, 0, 23.757, -58.299, -34.956,
0.402, 0, -67.585, -14.314, -73.426, 23.158, 1.782, 0, 4.399,
18.871, -6.929)
- 有没有办法规范化这些数据?
- 由于这个数据范围在 -90 到 90 之间,归一化数据应该在类似的范围内并且不应该变化很大,即范围不应该更改为 -1 到 1 或 -20 到 20 等。 .
我尝试过使用 atan(X)
、1/x
、log(x)
和许多其他转换技术,但它们都倾向于增加偏度。有没有一种方法可以在不扭曲数据的情况下对其进行标准化?
我相信一定有一个简单的解决方案。
它可能不是你想要的,但你几乎总是可以完美地使用正态分数转换[=]标准化分布(如果没有关系) 22=]:
xq <- qnorm(rank(x)/(length(x)+1), mean=mean(x), sd=sd(x))
plot(sort(x),sort(xq))
hist(xq)
qqnorm(xq)
新范围是 (-99.2, 99.6)(旧范围是 +/- 88)。
如果您需要更改范围,您可以按如下方式进行:
newmin + (newmax-newmin)*scale(xq, center=min(qx), scale=diff(range(xq)))
但正如评论中所建议的那样,这实际上可能不是解决更广泛问题的正确方法。