添加基于其他列和行的新列
Adding a new column based on other columns and rows
我有一个大数据框。让我写一个示例数据框让你理解我的问题。
A B C
car red 15
car blue 20
car grey 14
bike red 6
bike blue 8
phone red 9
phone blue 11
phone grey 10
假设 C 列显示价格。我想添加一个名为“D”的列。此列将回答“读取的汽车是否比所有汽车的平均价格贵?”。以及其他 A 值的相同问题。我的问题基本上就是这样。我想看这个:
A B C D
car red 15 cheap
car blue 20 expensive
car grey 14 cheap
bike red 6 cheap
bike blue 8 expensive
phone red 9 cheap
phone blue 11 expensive
phone grey 10 cheap
我写了太多方法来完成这个任务。最后我认为这段代码可以解决我的问题,但事实并非如此。我用 While 循环尝试了同样的事情,但我一直收到 Key Error 0。我该怎么办?这是我试过的代码:
df["D"] = "cheap"
A.values = df.A.unique()
for b in A.values:
for i in range(len(df.loc[data.A== b])):
if df.loc[df.A== b, "C"][i] >= df.loc[df.A== b, "C"].mean():
df.loc[df.A== b, "D"][i] = "expensive"
用mean
勾选transform
,然后np.where
s = df.groupby('A').C.transform('mean')
df['D'] = np.where(df.C>s, 'expensive', 'cheap')
df
Out[158]:
A B C D
0 car red 15 cheap
1 car blue 20 expensive
2 car grey 14 cheap
3 bike red 6 cheap
4 bike blue 8 expensive
5 phone red 9 cheap
6 phone blue 11 expensive
7 phone grey 10 cheap
df['D']=np.where(df[['A', 'B', 'C']].groupby('A').apply(lambda x: (x['C'].mean()>=x['C'])),'cheap','expensive')
A B C D
0 car red 15 cheap
1 car blue 20 expensive
2 car grey 14 cheap
3 bike red 6 expensive
4 bike blue 8 cheap
5 phone red 9 cheap
6 phone blue 11 expensive
7 phone grey 10 cheap
工作原理
np.where(condition, if met answer, not met answer)
#Apply boolean select to get condition. In this statement we seek to return true if mean is greater than price
condition= df[['A', 'B', 'C']].groupby('A').apply(lambda x: (x['C'].mean()>=x['C']))
if met answer= 'cheap'
not me t answer='expensive'
我有一个大数据框。让我写一个示例数据框让你理解我的问题。
A B C
car red 15
car blue 20
car grey 14
bike red 6
bike blue 8
phone red 9
phone blue 11
phone grey 10
假设 C 列显示价格。我想添加一个名为“D”的列。此列将回答“读取的汽车是否比所有汽车的平均价格贵?”。以及其他 A 值的相同问题。我的问题基本上就是这样。我想看这个:
A B C D
car red 15 cheap
car blue 20 expensive
car grey 14 cheap
bike red 6 cheap
bike blue 8 expensive
phone red 9 cheap
phone blue 11 expensive
phone grey 10 cheap
我写了太多方法来完成这个任务。最后我认为这段代码可以解决我的问题,但事实并非如此。我用 While 循环尝试了同样的事情,但我一直收到 Key Error 0。我该怎么办?这是我试过的代码:
df["D"] = "cheap"
A.values = df.A.unique()
for b in A.values:
for i in range(len(df.loc[data.A== b])):
if df.loc[df.A== b, "C"][i] >= df.loc[df.A== b, "C"].mean():
df.loc[df.A== b, "D"][i] = "expensive"
用mean
勾选transform
,然后np.where
s = df.groupby('A').C.transform('mean')
df['D'] = np.where(df.C>s, 'expensive', 'cheap')
df
Out[158]:
A B C D
0 car red 15 cheap
1 car blue 20 expensive
2 car grey 14 cheap
3 bike red 6 cheap
4 bike blue 8 expensive
5 phone red 9 cheap
6 phone blue 11 expensive
7 phone grey 10 cheap
df['D']=np.where(df[['A', 'B', 'C']].groupby('A').apply(lambda x: (x['C'].mean()>=x['C'])),'cheap','expensive')
A B C D
0 car red 15 cheap
1 car blue 20 expensive
2 car grey 14 cheap
3 bike red 6 expensive
4 bike blue 8 cheap
5 phone red 9 cheap
6 phone blue 11 expensive
7 phone grey 10 cheap
工作原理
np.where(condition, if met answer, not met answer)
#Apply boolean select to get condition. In this statement we seek to return true if mean is greater than price
condition= df[['A', 'B', 'C']].groupby('A').apply(lambda x: (x['C'].mean()>=x['C']))
if met answer= 'cheap'
not me t answer='expensive'