python 留一法估计
python leave-one-out estimation
我想从某个向量 x=(x_1,x_2, ..., x_I)
获得一个矩阵,其中该矩阵中的每一行 i 对应于 x(i) := (x_1,...,x_{i-1},x_{i+1},...,x_I)
。
我知道
from sklearn.cross_validation import LeaveOneOut
I = 30
myrowiterator = LeaveOneOut(I)
for eachrow, _ in myrowiterator:
print(eachrow) # prints [1,2,...,29]
# [0,2,...,29] and so on ...
提供获取上述矩阵每一行的例程。但是我宁愿一步直接得到矩阵,直接在这个矩阵上操作,而不是循环遍历它的行。那会为我节省一些计算时间。
以下将执行此操作:
In [31]: np.array([row for row, _ in LeaveOneOut(I)])
Out[31]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
...
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]])
因为你有 numpy 标签,下面的工作:
>>> N = 5
>>> idx = np.arange(N)
>>> idx = idx[1:] - (idx[:, None] >= idx[1:])
>>> idx
array([[1, 2, 3, 4],
[0, 2, 3, 4],
[0, 1, 3, 4],
[0, 1, 2, 4],
[0, 1, 2, 3]])
您现在可以使用它来索引任何其他数组:
>>> a = np.array(['a', 'b', 'c', 'd', 'e'])
>>> a[idx]
array([['b', 'c', 'd', 'e'],
['a', 'c', 'd', 'e'],
['a', 'b', 'd', 'e'],
['a', 'b', 'c', 'e'],
['a', 'b', 'c', 'd']],
dtype='|S1')
EDIT 正如@user3820991 所建议的那样,可以通过将其写为:
>>> N = 5
>>> idx = np.arange(1, N) - np.tri(N, N-1, k=-1, dtype=bool)
>>> idx
array([[1, 2, 3, 4],
[0, 2, 3, 4],
[0, 1, 3, 4],
[0, 1, 2, 4],
[0, 1, 2, 3]])
函数np.tri
实际上是这个答案第一个版本中神奇比较的高度优化版本,因为它使用尽可能小的 int 类型作为数组的大小,因为 numpy 中的比较是使用SIMD矢量化,所以类型越小,运算越快。
我想从某个向量 x=(x_1,x_2, ..., x_I)
获得一个矩阵,其中该矩阵中的每一行 i 对应于 x(i) := (x_1,...,x_{i-1},x_{i+1},...,x_I)
。
我知道
from sklearn.cross_validation import LeaveOneOut
I = 30
myrowiterator = LeaveOneOut(I)
for eachrow, _ in myrowiterator:
print(eachrow) # prints [1,2,...,29]
# [0,2,...,29] and so on ...
提供获取上述矩阵每一行的例程。但是我宁愿一步直接得到矩阵,直接在这个矩阵上操作,而不是循环遍历它的行。那会为我节省一些计算时间。
以下将执行此操作:
In [31]: np.array([row for row, _ in LeaveOneOut(I)])
Out[31]:
array([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 0, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 0, 1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[ 0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
...
[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]])
因为你有 numpy 标签,下面的工作:
>>> N = 5
>>> idx = np.arange(N)
>>> idx = idx[1:] - (idx[:, None] >= idx[1:])
>>> idx
array([[1, 2, 3, 4],
[0, 2, 3, 4],
[0, 1, 3, 4],
[0, 1, 2, 4],
[0, 1, 2, 3]])
您现在可以使用它来索引任何其他数组:
>>> a = np.array(['a', 'b', 'c', 'd', 'e'])
>>> a[idx]
array([['b', 'c', 'd', 'e'],
['a', 'c', 'd', 'e'],
['a', 'b', 'd', 'e'],
['a', 'b', 'c', 'e'],
['a', 'b', 'c', 'd']],
dtype='|S1')
EDIT 正如@user3820991 所建议的那样,可以通过将其写为:
>>> N = 5
>>> idx = np.arange(1, N) - np.tri(N, N-1, k=-1, dtype=bool)
>>> idx
array([[1, 2, 3, 4],
[0, 2, 3, 4],
[0, 1, 3, 4],
[0, 1, 2, 4],
[0, 1, 2, 3]])
函数np.tri
实际上是这个答案第一个版本中神奇比较的高度优化版本,因为它使用尽可能小的 int 类型作为数组的大小,因为 numpy 中的比较是使用SIMD矢量化,所以类型越小,运算越快。