使用同一数据框中另一列的值在数据框中创建新列
creating new column in dataframe with the values from another column in the same dataframe
作为一名科研人员,我是Python的初学者。
我正在尝试在以下数据框中创建一个新列:
x y z bat gradient
date
2022-04-15 10:17:14.721 0.125 0.016 1.032 NaN 0.0320
2022-04-15 10:17:39.721 0.125 -0.016 1.032 NaN 0.0000
2022-04-15 10:18:04.721 0.125 0.016 1.032 NaN 0.0000
2022-04-15 10:18:29.721 0.125 -0.016 1.032 NaN 0.0000
2022-04-15 10:18:54.721 0.125 0.016 1.032 NaN 0.0160
... ... ... ... ...
2022-05-02 17:03:04.721 -0.750 -0.016 0.710 NaN 0.7855
2022-05-02 17:03:29.721 -0.750 -0.016 0.710 NaN 1.4420
2022-05-02 17:03:54.721 0.719 -0.302 -0.419 NaN 0.8690
2022-05-02 17:04:19.721 -0.625 -0.048 -0.871 NaN 1.1965
2022-05-02 17:04:44.721 -0.969 0.016 -0.032 NaN 1.2470
而且我有某些 limits/intervals(来自箱线图的胡须):
limit_start_A = 0.15
limit_end_A = 0.20
limit_start_B =0.20
limit_end_B = 0.40
limit_start_C = 0.40
limit_end_C = 0.90
limit_start_D = 0.90
limit_end_D = 1.1
我想根据“梯度”列中的值创建一个名为“结果”的新列。因此,当渐变的值介于“limit_start_B - limit_start_B”的 limit/interval 之间时,它会为新“结果”列中的行提供字母“B”。
不要使用那么多变量,而是使用列表和 pandas.cut
:
limits = [0.15, 0.20, 0.40, 0.90, 1.1]
labels = ['A', 'B', 'C', 'D']
df['result'] = pd.cut(df['gradient'], bins=limits, labels=labels)
输出:
x y z bat gradient result
date
2022-04-15 10:17:14.721 0.125 0.016 1.032 NaN 0.0320 NaN
2022-04-15 10:17:39.721 0.125 -0.016 1.032 NaN 0.0000 NaN
2022-04-15 10:18:04.721 0.125 0.016 1.032 NaN 0.0000 NaN
2022-04-15 10:18:29.721 0.125 -0.016 1.032 NaN 0.0000 NaN
2022-04-15 10:18:54.721 0.125 0.016 1.032 NaN 0.0160 NaN
2022-05-02 17:03:04.721 -0.750 -0.016 0.710 NaN 0.7855 C
2022-05-02 17:03:29.721 -0.750 -0.016 0.710 NaN 1.4420 NaN
2022-05-02 17:03:54.721 0.719 -0.302 -0.419 NaN 0.8690 C
2022-05-02 17:04:19.721 -0.625 -0.048 -0.871 NaN 1.1965 NaN
2022-05-02 17:04:44.721 -0.969 0.016 -0.032 NaN 1.2470 NaN
作为一名科研人员,我是Python的初学者。
我正在尝试在以下数据框中创建一个新列:
x y z bat gradient
date
2022-04-15 10:17:14.721 0.125 0.016 1.032 NaN 0.0320
2022-04-15 10:17:39.721 0.125 -0.016 1.032 NaN 0.0000
2022-04-15 10:18:04.721 0.125 0.016 1.032 NaN 0.0000
2022-04-15 10:18:29.721 0.125 -0.016 1.032 NaN 0.0000
2022-04-15 10:18:54.721 0.125 0.016 1.032 NaN 0.0160
... ... ... ... ...
2022-05-02 17:03:04.721 -0.750 -0.016 0.710 NaN 0.7855
2022-05-02 17:03:29.721 -0.750 -0.016 0.710 NaN 1.4420
2022-05-02 17:03:54.721 0.719 -0.302 -0.419 NaN 0.8690
2022-05-02 17:04:19.721 -0.625 -0.048 -0.871 NaN 1.1965
2022-05-02 17:04:44.721 -0.969 0.016 -0.032 NaN 1.2470
而且我有某些 limits/intervals(来自箱线图的胡须):
limit_start_A = 0.15
limit_end_A = 0.20
limit_start_B =0.20
limit_end_B = 0.40
limit_start_C = 0.40
limit_end_C = 0.90
limit_start_D = 0.90
limit_end_D = 1.1
我想根据“梯度”列中的值创建一个名为“结果”的新列。因此,当渐变的值介于“limit_start_B - limit_start_B”的 limit/interval 之间时,它会为新“结果”列中的行提供字母“B”。
不要使用那么多变量,而是使用列表和 pandas.cut
:
limits = [0.15, 0.20, 0.40, 0.90, 1.1]
labels = ['A', 'B', 'C', 'D']
df['result'] = pd.cut(df['gradient'], bins=limits, labels=labels)
输出:
x y z bat gradient result
date
2022-04-15 10:17:14.721 0.125 0.016 1.032 NaN 0.0320 NaN
2022-04-15 10:17:39.721 0.125 -0.016 1.032 NaN 0.0000 NaN
2022-04-15 10:18:04.721 0.125 0.016 1.032 NaN 0.0000 NaN
2022-04-15 10:18:29.721 0.125 -0.016 1.032 NaN 0.0000 NaN
2022-04-15 10:18:54.721 0.125 0.016 1.032 NaN 0.0160 NaN
2022-05-02 17:03:04.721 -0.750 -0.016 0.710 NaN 0.7855 C
2022-05-02 17:03:29.721 -0.750 -0.016 0.710 NaN 1.4420 NaN
2022-05-02 17:03:54.721 0.719 -0.302 -0.419 NaN 0.8690 C
2022-05-02 17:04:19.721 -0.625 -0.048 -0.871 NaN 1.1965 NaN
2022-05-02 17:04:44.721 -0.969 0.016 -0.032 NaN 1.2470 NaN