pandas 切片数据帧的索引如何工作

Question

如何从切片的 datfarme 中获取正确的行？

为了说明我的意思，请查看以下代码示例：

import lightgbm as lgb
from sklearn.model_selection import train_test_split
import numpy as np
data=pd.DataFrame()
data['one']=range(0,1000)
data['p1']=data['one']+1
data['p2']=data['one']+2
label=data['p1']%2==0
X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.2, random_state=100)
lgb_model = lgb.LGBMClassifier(objective = 'binary')
lgb_fitted = lgb_model.fit(X_train, y_train, verbose = False)
y_prob=lgb_fitted.predict_proba(X_test)
y_prob= pd.DataFrame(y_prob,columns = ['No','Yes'])
model_uncertain=y_prob.loc[(y_prob['Yes'] >= .5) & (y_prob['Yes'] <= .52)]
model_uncertain

我的问题：

如何获取 X_test 数据框中与 model_uncertain 数据框中第一个原始数据相关的行？

为了确保我得到正确的行，我通过将同一行传递给 predict_proba 使用以下代码，我应该会得到相同的结果：

y_prob_3=lgb_fitted.predict_proba([X_test.iloc[3]])
y_prob_3

但是结果不一样

我认为我没有将正确的行发送到 predict_proba，因为它应该 return 为一行发送相同的值。

在model_uncertain中找到第n行并在X_test数据框中找到对应行的正确方法是什么？

Answer 1

How can I get the row in the X_test dataframe which is related to the first raw in model_uncertain data frame?

你走在正确的轨道上：

>>> idx_of_first_uncertainty_row = model_uncertain.iloc[0].index
>>> row_in_test_data = X.loc[idx_of_first_uncertainty_row]

是的，索引保留在原始数据帧及其切片之间（除非您在两者之间的某个位置重置索引）。

To make sure that I am getting the right row, I test it using passing the same row to predict_proba using the following code as I should get the same result (...) But the result is not the same.

为什么你认为它们不一样？在数据帧图像中，您看不到所有小数。确认它们是否相同（嗯，真的很相似）的更好方法是使用 np.isclose 之类的东西来比较 model_uncertain.iloc[0] （数据框的第一行）和 X_train.loc[3] （行其中索引为 3):

>>> np.isclose(model_uncertain.iloc[0].values, X_train.loc[3].values)

pandas 切片数据帧的索引如何工作

How indexing of sliced data frame works in pandas

python

pandas

lightgbm