数据框的条件操作
conditional operation on dataframe
我的目标是根据考虑其他列值的条件向数据框添加一列。
我创建了一个生成相同错误的简单示例:
numbers = {'A': [1,2,3,4,5], "B":[2,4,3,3,2]}
df = pd.DataFrame(numbers)
if df.A - df.B > 0:
df["C"] = df.B*5
else: df["C"] = 0
ValueError: The truth value of a Series is ambiguous. Use a.empty,
a.bool(), a.item(), a.any() or a.all().
我确信解决方案很简单,但我是初学者。感谢支持
c = []
for lab, row in df.iterrows():
curr = 0
if row['A'] > row['B']:
curr = row['B'] * 5
c.append(curr)
df['C'] = c
你可以这样做:
df["C"] = df.A - df.B
# First turns negative values into 0s
df["C"].mask(df["C"] <= 0, 0, inplace=True)
# Then changes the value as needed if C > 0.
df["C"].mask(df["C"] > 0, df["B"]*5, inplace=True)
#Import pandas module
import pandas as pd
#Lists of data
list_A = [1,2,3,4,5]
list_B = [2,4,3,3,2]
#Define a dictionary containing lists of data
dictionary = {'A': list_A,
'B': list_B}
#Convert the dictionary into DataFrame
data = pd.DataFrame(dictionary)
data
#New list
data_diff = data.A - data.B
new_list=[]
for i in data_diff:
if i > 0:
new_list.append(i*5)
else:
new_list.append(0)
#New dataframe
new_dictionary = {'A': [1,2,3,4,5],
'B': [2,4,3,3,2],
'C': new_list}
new_data=pd.DataFrame(new_dictionary)
new_data
注意事项:这是我的非常简单的版本。当然,还有许多其他更智能、更“pythonic”的版本。最后,我觉得this教程网站可以帮到你。
你可以使用 numpy 的 where
:
df["C"] = np.where(df["A"] > df["B"], df["B"]*5, 0)
我的目标是根据考虑其他列值的条件向数据框添加一列。
我创建了一个生成相同错误的简单示例:
numbers = {'A': [1,2,3,4,5], "B":[2,4,3,3,2]}
df = pd.DataFrame(numbers)
if df.A - df.B > 0:
df["C"] = df.B*5
else: df["C"] = 0
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我确信解决方案很简单,但我是初学者。感谢支持
c = []
for lab, row in df.iterrows():
curr = 0
if row['A'] > row['B']:
curr = row['B'] * 5
c.append(curr)
df['C'] = c
你可以这样做:
df["C"] = df.A - df.B
# First turns negative values into 0s
df["C"].mask(df["C"] <= 0, 0, inplace=True)
# Then changes the value as needed if C > 0.
df["C"].mask(df["C"] > 0, df["B"]*5, inplace=True)
#Import pandas module
import pandas as pd
#Lists of data
list_A = [1,2,3,4,5]
list_B = [2,4,3,3,2]
#Define a dictionary containing lists of data
dictionary = {'A': list_A,
'B': list_B}
#Convert the dictionary into DataFrame
data = pd.DataFrame(dictionary)
data
#New list
data_diff = data.A - data.B
new_list=[]
for i in data_diff:
if i > 0:
new_list.append(i*5)
else:
new_list.append(0)
#New dataframe
new_dictionary = {'A': [1,2,3,4,5],
'B': [2,4,3,3,2],
'C': new_list}
new_data=pd.DataFrame(new_dictionary)
new_data
注意事项:这是我的非常简单的版本。当然,还有许多其他更智能、更“pythonic”的版本。最后,我觉得this教程网站可以帮到你。
你可以使用 numpy 的 where
:
df["C"] = np.where(df["A"] > df["B"], df["B"]*5, 0)