从更大的数据帧中随机子集化数据帧

Question

n = 100 # (n=height * width)
height = 10
width = 10
column = [1,2,3,4,5,6,7,8,9,10]
indices = [1,2,3,4,5,6,7,8,9,10]

Rack2 = pd.DataFrame(np.random.choice(np.arange(n),size=(height, width), replace=False), index=list(indices), columns=list(column))
Rack = Rack2.sort_index(ascending=False)
a = np.repeat([True,False], Rack.size//2) 
b = np.random.shuffle(a)
a = a.reshape(Rack.shape)

SI = Rack.mask(a)
RI = Rack.where(a)

StorageSet = SI.stack() 
ss=dfStorage.index

RetrievalSet = RI.stack() 
tt=D3.index

在上面的 python 代码中，有一个 10x10 的机架。架子的一半（50 件）由存储物品组成，另一半由检索物品组成。

我不想做机架大小的一半，但如果我有一个 10x10 机架，例如该数据框的 30 个是存储项目。其余 70 项中的 30 项是检索项。我该怎么做？

Answer 1

您可以通过对代码进行一些修改来做到这一点。先改一下a的初始化：

samp_size = 30
a = np.hstack([np.repeat(0, samp_size), np.repeat(1, samp_size), np.repeat(np.nan, n - (2 * samp_size)])

然后你可以得到 SI 和 RI 为：

SI = Rack.where(a==0)
RI = Rack.where(a==1)

您的其余代码应该同样有效。

从更大的数据帧中随机子集化数据帧

Random subsetting a data frame from a larger dataframe

python

random

numpy

dataframe

pandas