DataFrame 顺序自行改组
DataFrame order shuffling itself
我正在尝试研究为什么我的数据框在转换为数组后会更改其顺序。下面是我的代码:
header_list = ["output", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15" ,"16", "17", "18", "19", "20",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30"]
df = pd.read_csv(('data.csv'), names = header_list)
#Splitting data 70/30 for training and testing sets
trainingdata = df.sample(frac=0.7)
#assigning Y to be the first column, and X as the rest
X = trainingdata.iloc[:,1:].to_numpy()
Y = trainingdata.iloc[:,0].to_numpy().reshape(-1, 1)
print(trainingdata)
输出:
output 1 2 ... 28 29 30
12 0 0.267358 0.373690 ... 0.379725 0.130298 0.195592
27 1 0.313739 0.506595 ... 0.456701 0.375517 0.157156
450 0 0.181693 0.490362 ... 0.112165 0.294500 0.139184
440 0 0.033603 0.531958 ... 0.171821 0.241474 0.338187
54 0 0.197312 0.113967 ... 0.189210 0.255076 0.083169
.. ... ... ... ... ... ... ...
20 1 0.519144 0.348326 ... 0.407216 0.653854 0.039814
231 1 0.428274 0.196145 ... 0.680756 0.286615 0.237439
55 0 0.291968 0.190396 ... 0.334089 0.450227 0.205234
159 1 0.410762 0.456206 ... 0.846048 0.337473 0.307359
117 0 0.232335 0.292188 ... 0.391065 0.361128 0.187656
您可以看到我的索引列是随机排列的,而我的原始数据框是按数字顺序排列的,是我在这里执行语法错误导致的吗?
这来自 pandas 中的 sample
操作。默认情况下,它会从您的数据框中随机选择 rows/columns。
阅读有关它的文档here。
如果您希望每次执行代码时都选择相同的选项(可重复性),您可以使用 random_state
选项。
我正在尝试研究为什么我的数据框在转换为数组后会更改其顺序。下面是我的代码:
header_list = ["output", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15" ,"16", "17", "18", "19", "20",
"21", "22", "23", "24", "25", "26", "27", "28", "29", "30"]
df = pd.read_csv(('data.csv'), names = header_list)
#Splitting data 70/30 for training and testing sets
trainingdata = df.sample(frac=0.7)
#assigning Y to be the first column, and X as the rest
X = trainingdata.iloc[:,1:].to_numpy()
Y = trainingdata.iloc[:,0].to_numpy().reshape(-1, 1)
print(trainingdata)
输出:
output 1 2 ... 28 29 30
12 0 0.267358 0.373690 ... 0.379725 0.130298 0.195592
27 1 0.313739 0.506595 ... 0.456701 0.375517 0.157156
450 0 0.181693 0.490362 ... 0.112165 0.294500 0.139184
440 0 0.033603 0.531958 ... 0.171821 0.241474 0.338187
54 0 0.197312 0.113967 ... 0.189210 0.255076 0.083169
.. ... ... ... ... ... ... ...
20 1 0.519144 0.348326 ... 0.407216 0.653854 0.039814
231 1 0.428274 0.196145 ... 0.680756 0.286615 0.237439
55 0 0.291968 0.190396 ... 0.334089 0.450227 0.205234
159 1 0.410762 0.456206 ... 0.846048 0.337473 0.307359
117 0 0.232335 0.292188 ... 0.391065 0.361128 0.187656
您可以看到我的索引列是随机排列的,而我的原始数据框是按数字顺序排列的,是我在这里执行语法错误导致的吗?
这来自 pandas 中的 sample
操作。默认情况下,它会从您的数据框中随机选择 rows/columns。
阅读有关它的文档here。
如果您希望每次执行代码时都选择相同的选项(可重复性),您可以使用 random_state
选项。