skflow pandas 数据集平均每 2 行
skflow pandas dataset mean down each 2 lines
我有一个数据集 X,可以说是 2000 行。我想把每 2 行都表示在一起。结果应该是一个 1000 行的数据集(列数应该保持不变)。
我已经在 matlab 中做过这个
#matlab function
function [ divisibleMatrix ] = meanDown( matrix )
%MEANDOWN takes a matrix and means every 2 lines (makes it half the size)
newSize = (floor(size(matrix, 1)/2)*2); %make it divisible
divisibleMatrix = matrix(1:newSize, 1:end);
D = divisibleMatrix;
m=size(divisibleMatrix, 1);
n=size(divisibleMatrix, 2);
% compute mean for two neighboring rows
D=reshape(D, 2, m/2*n);
D=(D(1,:)+D(2,:))/2;
D=reshape(D, m/2, n);
divisibleMatrix = D;
end
这是 1-liner。关键是groupby可以取index->group的任意映射。 df.index / 2
给你想要的东西。
In [1]: pd.options.display.max_rows=10
In [2]: df = DataFrame(np.random.randn(2000,10))
In [3]: df
Out[3]:
0 1 2 3 4 5 6 7 8 9
0 1.424278 1.341120 -1.926183 2.277194 0.257652 -1.837933 0.548063 -1.554667 0.485864 0.939497
1 -0.389531 -0.122452 0.514899 0.112404 -1.137853 0.814050 0.464444 0.180946 -0.873092 1.376984
2 1.244440 -1.358285 -1.167748 -1.103943 0.268973 0.954938 -1.041816 -0.549772 0.639713 -0.064106
3 0.907945 -0.705092 -2.251826 0.032511 0.132661 0.101646 -0.385823 -0.197524 0.726309 -0.044143
4 0.045390 -1.476742 0.511301 0.259116 0.255900 -1.621707 1.592440 -1.792673 -0.256589 -1.626885
... ... ... ... ... ... ... ... ... ... ...
1995 0.452570 0.097372 0.055521 0.387842 0.188056 2.392688 0.292957 -1.141517 -0.420548 -1.357877
1996 -2.155074 0.411274 0.357251 -0.326192 -0.493771 0.805255 0.156565 0.439860 -0.149214 0.329143
1997 1.141906 -0.595052 1.054630 0.705025 0.527523 -1.328829 0.726637 -0.889798 0.672279 -1.699829
1998 1.210885 0.550444 0.903205 1.240884 0.634060 0.595759 0.155567 -0.865876 0.197398 0.194864
1999 -0.273097 0.234418 1.172747 1.993209 0.271385 0.449079 -1.029834 -0.246728 -0.110820 -1.588270
[2000 rows x 10 columns]
In [4]: df.groupby(df.index/2).mean()
Out[4]:
0 1 2 3 4 5 6 7 8 9
0 0.517374 0.609334 -0.705642 1.194799 -0.440100 -0.511942 0.506254 -0.686860 -0.193614 1.158240
1 1.076193 -1.031689 -1.709787 -0.535716 0.200817 0.528292 -0.713820 -0.373648 0.683011 -0.054125
2 -0.189650 -0.652461 -0.496076 -0.129063 0.209076 -1.463476 0.549773 -1.228766 0.255020 -0.231682
3 -0.804283 0.985501 0.321846 -0.570661 0.023639 0.473073 1.636425 -0.336158 0.427294 -0.063739
4 0.982331 0.088111 1.601761 -0.193683 -0.488863 1.113968 1.099340 -0.785286 0.370041 -0.095078
.. ... ... ... ... ... ... ... ... ... ...
995 0.244260 -0.754283 -1.318084 -1.157576 -0.159194 -0.245290 0.230198 -0.996492 -0.520177 0.125455
996 -0.604840 -0.628592 0.952476 1.049358 -0.392648 -0.121538 0.544432 0.309035 0.254711 -0.664254
997 -0.006366 -0.511019 -0.855803 0.103337 -1.131138 1.942504 -0.418524 -0.132304 0.266050 -0.055807
998 -0.506584 -0.091889 0.705941 0.189417 0.016876 -0.261787 0.441601 -0.224969 0.261533 -0.685343
999 0.468894 0.392431 1.037976 1.617047 0.452722 0.522419 -0.437133 -0.556302 0.043289 -0.696703
[1000 rows x 10 columns]
In [5]: df.index/2
Out[5]:
Int64Index([ 0, 0, 1, 1, 2, 2, 3, 3, 4, 4,
...
995, 995, 996, 996, 997, 997, 998, 998, 999, 999], dtype='int64', length=2000)
我有一个数据集 X,可以说是 2000 行。我想把每 2 行都表示在一起。结果应该是一个 1000 行的数据集(列数应该保持不变)。
我已经在 matlab 中做过这个
#matlab function
function [ divisibleMatrix ] = meanDown( matrix )
%MEANDOWN takes a matrix and means every 2 lines (makes it half the size)
newSize = (floor(size(matrix, 1)/2)*2); %make it divisible
divisibleMatrix = matrix(1:newSize, 1:end);
D = divisibleMatrix;
m=size(divisibleMatrix, 1);
n=size(divisibleMatrix, 2);
% compute mean for two neighboring rows
D=reshape(D, 2, m/2*n);
D=(D(1,:)+D(2,:))/2;
D=reshape(D, m/2, n);
divisibleMatrix = D;
end
这是 1-liner。关键是groupby可以取index->group的任意映射。 df.index / 2
给你想要的东西。
In [1]: pd.options.display.max_rows=10
In [2]: df = DataFrame(np.random.randn(2000,10))
In [3]: df
Out[3]:
0 1 2 3 4 5 6 7 8 9
0 1.424278 1.341120 -1.926183 2.277194 0.257652 -1.837933 0.548063 -1.554667 0.485864 0.939497
1 -0.389531 -0.122452 0.514899 0.112404 -1.137853 0.814050 0.464444 0.180946 -0.873092 1.376984
2 1.244440 -1.358285 -1.167748 -1.103943 0.268973 0.954938 -1.041816 -0.549772 0.639713 -0.064106
3 0.907945 -0.705092 -2.251826 0.032511 0.132661 0.101646 -0.385823 -0.197524 0.726309 -0.044143
4 0.045390 -1.476742 0.511301 0.259116 0.255900 -1.621707 1.592440 -1.792673 -0.256589 -1.626885
... ... ... ... ... ... ... ... ... ... ...
1995 0.452570 0.097372 0.055521 0.387842 0.188056 2.392688 0.292957 -1.141517 -0.420548 -1.357877
1996 -2.155074 0.411274 0.357251 -0.326192 -0.493771 0.805255 0.156565 0.439860 -0.149214 0.329143
1997 1.141906 -0.595052 1.054630 0.705025 0.527523 -1.328829 0.726637 -0.889798 0.672279 -1.699829
1998 1.210885 0.550444 0.903205 1.240884 0.634060 0.595759 0.155567 -0.865876 0.197398 0.194864
1999 -0.273097 0.234418 1.172747 1.993209 0.271385 0.449079 -1.029834 -0.246728 -0.110820 -1.588270
[2000 rows x 10 columns]
In [4]: df.groupby(df.index/2).mean()
Out[4]:
0 1 2 3 4 5 6 7 8 9
0 0.517374 0.609334 -0.705642 1.194799 -0.440100 -0.511942 0.506254 -0.686860 -0.193614 1.158240
1 1.076193 -1.031689 -1.709787 -0.535716 0.200817 0.528292 -0.713820 -0.373648 0.683011 -0.054125
2 -0.189650 -0.652461 -0.496076 -0.129063 0.209076 -1.463476 0.549773 -1.228766 0.255020 -0.231682
3 -0.804283 0.985501 0.321846 -0.570661 0.023639 0.473073 1.636425 -0.336158 0.427294 -0.063739
4 0.982331 0.088111 1.601761 -0.193683 -0.488863 1.113968 1.099340 -0.785286 0.370041 -0.095078
.. ... ... ... ... ... ... ... ... ... ...
995 0.244260 -0.754283 -1.318084 -1.157576 -0.159194 -0.245290 0.230198 -0.996492 -0.520177 0.125455
996 -0.604840 -0.628592 0.952476 1.049358 -0.392648 -0.121538 0.544432 0.309035 0.254711 -0.664254
997 -0.006366 -0.511019 -0.855803 0.103337 -1.131138 1.942504 -0.418524 -0.132304 0.266050 -0.055807
998 -0.506584 -0.091889 0.705941 0.189417 0.016876 -0.261787 0.441601 -0.224969 0.261533 -0.685343
999 0.468894 0.392431 1.037976 1.617047 0.452722 0.522419 -0.437133 -0.556302 0.043289 -0.696703
[1000 rows x 10 columns]
In [5]: df.index/2
Out[5]:
Int64Index([ 0, 0, 1, 1, 2, 2, 3, 3, 4, 4,
...
995, 995, 996, 996, 997, 997, 998, 998, 999, 999], dtype='int64', length=2000)