在 pandas 中创建条件列
Create conditional column in pandas
我正在尝试在 pandas 中创建一个条件列。这是数据框的样子。
data = [{"owner" : "john", "dog" : 'magie', "dog_is_fluffy" : 1},
{"owner" : "john", "dog" : 'stellar', "dog_is_fluffy" : 0},
{"owner" : "lisa", "dog" : 'mollie' , "dog_is_fluffy" : 0},
{"owner" : "lisa", "dog" : 'rex', "dog_is_fluffy" : 0},
{"owner" : "john", "dog" : 'luns', "dog_is_fluffy" : 1}]
df = pd.DataFrame(data)
如您所见,我的数据显示了狗及其主人。我们也知道狗是否蓬松。我想创建两列 fluffy_dogs_owned
和 owner_has_fluffy_dog
.
我要找的结果是:
data_result = [{"owner" : "john", "dog" : 'magie', "dog_is_fluffy" : 1, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1},
{"owner" : "john", "dog" : 'stellar', "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1},
{"owner" : "lisa", "dog" : 'mollie' , "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 0, "owner_has_fluffy_dog" : 0},
{"owner" : "lisa", "dog" : 'rex', "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 0, "owner_has_fluffy_dog" : 0},
{"owner" : "john", "dog" : 'luns', "dog_is_fluffy" : 1, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1}]
df_result = pd.DataFrame(data_result)
我考虑过使用 df.groupby()
和 np.where
,但到目前为止我无法让它工作。有任何想法吗?
使用 GroupBy.transform
for return Series
with same size like original Dataframe with sum
and then compare column for not equal by Series.ne
转换为整数
df['fluffy_dogs_owned'] = df.groupby('owner')['dog_is_fluffy'].transform('sum')
df['owner_has_fluffy_dog'] = df['fluffy_dogs_owned'].ne(0).astype(int)
或 Series.clip
:
df['owner_has_fluffy_dog'] = df['fluffy_dogs_owned'].clip(upper=1)
print (df)
dog dog_is_fluffy owner fluffy_dogs_owned owner_has_fluffy_dog
0 magie 1 john 2 1
1 stellar 0 john 2 1
2 mollie 0 lisa 0 0
3 rex 0 lisa 0 0
4 luns 1 john 2 1
我正在尝试在 pandas 中创建一个条件列。这是数据框的样子。
data = [{"owner" : "john", "dog" : 'magie', "dog_is_fluffy" : 1},
{"owner" : "john", "dog" : 'stellar', "dog_is_fluffy" : 0},
{"owner" : "lisa", "dog" : 'mollie' , "dog_is_fluffy" : 0},
{"owner" : "lisa", "dog" : 'rex', "dog_is_fluffy" : 0},
{"owner" : "john", "dog" : 'luns', "dog_is_fluffy" : 1}]
df = pd.DataFrame(data)
如您所见,我的数据显示了狗及其主人。我们也知道狗是否蓬松。我想创建两列 fluffy_dogs_owned
和 owner_has_fluffy_dog
.
我要找的结果是:
data_result = [{"owner" : "john", "dog" : 'magie', "dog_is_fluffy" : 1, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1},
{"owner" : "john", "dog" : 'stellar', "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1},
{"owner" : "lisa", "dog" : 'mollie' , "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 0, "owner_has_fluffy_dog" : 0},
{"owner" : "lisa", "dog" : 'rex', "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 0, "owner_has_fluffy_dog" : 0},
{"owner" : "john", "dog" : 'luns', "dog_is_fluffy" : 1, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1}]
df_result = pd.DataFrame(data_result)
我考虑过使用 df.groupby()
和 np.where
,但到目前为止我无法让它工作。有任何想法吗?
使用 GroupBy.transform
for return Series
with same size like original Dataframe with sum
and then compare column for not equal by Series.ne
转换为整数
df['fluffy_dogs_owned'] = df.groupby('owner')['dog_is_fluffy'].transform('sum')
df['owner_has_fluffy_dog'] = df['fluffy_dogs_owned'].ne(0).astype(int)
或 Series.clip
:
df['owner_has_fluffy_dog'] = df['fluffy_dogs_owned'].clip(upper=1)
print (df)
dog dog_is_fluffy owner fluffy_dogs_owned owner_has_fluffy_dog
0 magie 1 john 2 1
1 stellar 0 john 2 1
2 mollie 0 lisa 0 0
3 rex 0 lisa 0 0
4 luns 1 john 2 1