DataFrame 顺序自行改组

DataFrame order shuffling itself

我正在尝试研究为什么我的数据框在转换为数组后会更改其顺序。下面是我的代码:

header_list = ["output", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15" ,"16", "17", "18", "19", "20",
               "21", "22", "23", "24", "25", "26", "27", "28", "29", "30"]
df = pd.read_csv(('data.csv'), names = header_list)

#Splitting data 70/30 for training and testing sets
trainingdata = df.sample(frac=0.7)

#assigning Y to be the first column, and X as the rest
X = trainingdata.iloc[:,1:].to_numpy()
Y = trainingdata.iloc[:,0].to_numpy().reshape(-1, 1)
print(trainingdata)

输出:

   output         1         2  ...        28        29        30
12        0  0.267358  0.373690  ...  0.379725  0.130298  0.195592
27        1  0.313739  0.506595  ...  0.456701  0.375517  0.157156
450       0  0.181693  0.490362  ...  0.112165  0.294500  0.139184
440       0  0.033603  0.531958  ...  0.171821  0.241474  0.338187
54        0  0.197312  0.113967  ...  0.189210  0.255076  0.083169
..      ...       ...       ...  ...       ...       ...       ...
20        1  0.519144  0.348326  ...  0.407216  0.653854  0.039814
231       1  0.428274  0.196145  ...  0.680756  0.286615  0.237439
55        0  0.291968  0.190396  ...  0.334089  0.450227  0.205234
159       1  0.410762  0.456206  ...  0.846048  0.337473  0.307359
117       0  0.232335  0.292188  ...  0.391065  0.361128  0.187656

您可以看到我的索引列是随机排列的,而我的原始数据框是按数字顺序排列的,是我在这里执行语法错误导致的吗?

这来自 pandas 中的 sample 操作。默认情况下,它会从您的数据框中随机选择 rows/columns。

阅读有关它的文档here

如果您希望每次执行代码时都选择相同的选项(可重复性),您可以使用 random_state 选项。