根据 if/elif/and 函数在 pandas 数据框中创建新列
Create new column in pandas dataframe based on if/elif/and functions
我已经搜索了我的确切问题,但没有找到。这两个线程 Creating a new column based on if-elif-else condition 和
尽管我的代码无法执行,但它指导了我的代码。
问题:我有一个数据框,我在下面复制了示例。区域属性只有两个值 - a 或 b(或可能有更多),年份相同,尽管区域 a 可能有两个年份等。我想做的是创建一个新列 "dollars",然后查找区域的值,如果它是区域 "a" 并且年份是例如 2006,则在该行中获取销售额,然后乘以该年份的 比率 并在新列中追加值- 美元。我是初学者,下面是我到目前为止所拥有的 - 通过函数 - 显然执行 .apply 函数 returns a ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0').我对更有效的实现特别感兴趣,因为数据帧相当大并且希望优化计算效率。
import pandas as np
rate_2006, rate_2007 = 100, 200
c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}
df1 = pd.DataFrame(c)
df1
def new_col(row):
if df1["region"] == "a" and df1["year"] == 2006:
nc = row["sales"] * rate_2006
elif df1["region"] == "a" and df1["year"] == 2007:
nc = row["sales"] * rate_2007
elif df1["region"] == "b" and df1["year"] == 2006:
nc = row["sales"] * rate_2006
else:
nc = row["sales"] * rate_2007
return nc
df1["Dollars"] = df1.apply(new_col, axis=1)
df1
问题可能与您的使用方式有关。不知道对你有没有帮助。但我已经根据我的知识重新编写了有效的代码。
import pandas as pd
rate_2006, rate_2007 = 100, 200
c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}
df1 = pd.DataFrame(c)
print(df1)
def new_col(value):
if df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2006:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
elif df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2007:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
elif df1.loc[value,"region"] == "b" and df1.loc[value,"year"] == 2006:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
else:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
for value in range(len(df1)):
new_col(value)
我已经搜索了我的确切问题,但没有找到。这两个线程 Creating a new column based on if-elif-else condition 和
问题:我有一个数据框,我在下面复制了示例。区域属性只有两个值 - a 或 b(或可能有更多),年份相同,尽管区域 a 可能有两个年份等。我想做的是创建一个新列 "dollars",然后查找区域的值,如果它是区域 "a" 并且年份是例如 2006,则在该行中获取销售额,然后乘以该年份的 比率 并在新列中追加值- 美元。我是初学者,下面是我到目前为止所拥有的 - 通过函数 - 显然执行 .apply 函数 returns a ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0').我对更有效的实现特别感兴趣,因为数据帧相当大并且希望优化计算效率。
import pandas as np
rate_2006, rate_2007 = 100, 200
c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}
df1 = pd.DataFrame(c)
df1
def new_col(row):
if df1["region"] == "a" and df1["year"] == 2006:
nc = row["sales"] * rate_2006
elif df1["region"] == "a" and df1["year"] == 2007:
nc = row["sales"] * rate_2007
elif df1["region"] == "b" and df1["year"] == 2006:
nc = row["sales"] * rate_2006
else:
nc = row["sales"] * rate_2007
return nc
df1["Dollars"] = df1.apply(new_col, axis=1)
df1
问题可能与您的使用方式有关。不知道对你有没有帮助。但我已经根据我的知识重新编写了有效的代码。
import pandas as pd
rate_2006, rate_2007 = 100, 200
c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}
df1 = pd.DataFrame(c)
print(df1)
def new_col(value):
if df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2006:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
elif df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2007:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
elif df1.loc[value,"region"] == "b" and df1.loc[value,"year"] == 2006:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
else:
df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
for value in range(len(df1)):
new_col(value)