我正在尝试预测 X_test 的概率并在数组中获取 2 个值。我需要比较这 2 个值并将其设为 1

Question

我正在尝试预测 X_test 的概率并在数组中获取 2 个值。我需要比较这 2 个值并将其设为 1。

写代码的时候

y_pred = classifier.predict_proba(X_test)
y_pred

它给出的输出类似于

array([[0.5, 0.5],
       [0.6, 0.4],
       [0.7, 0.3],
       ...,
       [0.5, 0.5],
       [0.4, 0.6],
       [0.3, 0.7]])

我们知道，如果值 >= 0.5 则为 1，如果小于 0.5 则为 0

我使用下面的代码

将上面的数组转换为pandas

proba = pd.DataFrame(proba)
proba.columns = [['pred_0', 'pred_1']]
proba.head()

输出为

    pred_0  pred_1
0   0.5     0.5
1   0.6     0.4
2   0.7     0.3
3   0.4     0.6
4   0.3     0.7

如何迭代上面的行并编写条件，如果第1列的行值大于等于0.5，行值为2，则为1，如果第1列的行值小于0.5，则比较到第 2 列的行值。

例如，通过查看上面的数据框，输出必须是

  output
0 0
1 1
2 1
3 1
4 1

Answer 1

比较两列以创建布尔索引，然后使用 astype:

转换为 int

选项 1：

df['output'] = (df['pred_0'] >= df['pred_1']).astype(int)

选项 2：

df['output'] = df['pred_0'].ge(df['pred_1']).astype(int)

或通过np.where:

选项 3：

df['output'] = np.where(df['pred_0'] >= df['pred_1'], 1, 0)

选项 4：

df['output'] = np.where(df['pred_0'].ge(df['pred_1']), 1, 0)

   pred_0  pred_1  output
0     0.5     0.5       1
1     0.6     0.4       1
2     0.7     0.3       1
3     0.4     0.6       0
4     0.3     0.7       0

Answer 2

您可以只映射您的初始数组而不将其转换为 Pandas 数据框，以便当每个子数组的第一个值 >= 0.5 时它 returns 为真，否则它 returns 错误。最后，将其转换为 int:

>>> import numpy as np
>>> a = np.array([[0.5, 0.5], [0.6, 0.4], [0.3, 0.7]])
>>> a
array([[0.5, 0.5],
       [0.6, 0.4],
       [0.3, 0.7]])
>>> result = map(lambda x:int(x[0] >= 0.5), a)
>>> print(list(result))
[1, 1, 0]

我正在尝试预测 X_test 的概率并在数组中获取 2 个值。我需要比较这 2 个值并将其设为 1

I'm trying to predict probability of X_test and getting 2 values in an array. I need to compare those 2 values and make it 1

python

loops

dataframe

pandas

sklearn-pandas