使用多个条件和时差创建新列
Create New Columns Using Multiple Conditions And Time Difference
我有以下数据框有一个棘手的问题:
df = pd.DataFrame({'weight': [[200, 190, 188, 180, 170],
[181, 175, 172, 165, 150]],
'days_since_gym': [[0, 87, 174, 205, 279],
[43, 171, 241, 273, 300]]})
print(df)
weight days_since_gym
0 [200, 190, 188, 180, 170] [0, 91, 174, 205, 279]
1 [181, 175, 172, 165, 150] [93, 171, 241, 273, 300]
我必须根据以下条件制作 4 列(0-90 天、91-180 天、181-270 天、271-360 天):
1) If there are multiple weights in a specific time duration, get the maximum weight in that time duration column.
2) If no weight is present in that time duration, the value for that duration would be 0.
期望的输出:
weight days_since_gym 0-90 91-180 181-270 271-360
0 [200, 190, 188, 180, 170] [0, 87, 174, 205, 279] 200 188 180 170
1 [181, 175, 172, 165, 150] [93, 171, 241, 273, 300] 0 181 172 165
最明智的做法是什么?任何建议,将不胜感激。谢谢!
您可以编写一个自定义函数来接收权重列表、开始日期、结束日期 — 然后逐行应用此函数以使用 pandas apply 函数创建每个新列.如果您以前没有使用过 apply,基本结构类似于:df.apply(lambda x: custom_function(...), axis=1)
。参数 axis=1 确保您的自定义函数按行应用。
由于新列的名称也是开始日期和结束日期,因此您可以遍历这些开始日期和结束日期范围。
我还注意到,在您的问题中,您创建的 DataFrame 与所需输出之间似乎存在一些不匹配,因此我将所需输出作为 DataFrame。
import numpy as np
import pandas as pd
df = pd.DataFrame({'weight': [[200, 190, 188, 180, 170],
[181, 175, 172, 165, 150]],
'days_since_gym': [[0, 87, 174, 205, 279],
[93, 171, 241, 273, 300]]})
def return_max_weight(weights, days, start_day, end_day):
## get the indices where weights are between start and end days
days = np.array(days)
weights_idx = list(np.where((days >= start_day) & (days <= end_day))[0])
if len(weights_idx) == 0:
return 0
else:
weight_between_start_and_end = [weights[idx] for idx in weights_idx]
return max(weight_between_start_and_end)
for start_day, end_day in zip([0, 91, 181, 271],[90, 180, 270, 360]):
col_name = f"{start_day}-{end_day}"
df[col_name] = df[['weight','days_since_gym']].apply(
lambda x: return_max_weight(x['weight'], x['days_since_gym'], start_day, end_day),
axis=1
)
输出:
>>> df
weight days_since_gym 0-90 91-180 181-270 271-360
0 [200, 190, 188, 180, 170] [0, 87, 174, 205, 279] 200 188 180 170
1 [181, 175, 172, 165, 150] [93, 171, 241, 273, 300] 0 181 172 165
我有以下数据框有一个棘手的问题:
df = pd.DataFrame({'weight': [[200, 190, 188, 180, 170],
[181, 175, 172, 165, 150]],
'days_since_gym': [[0, 87, 174, 205, 279],
[43, 171, 241, 273, 300]]})
print(df)
weight days_since_gym
0 [200, 190, 188, 180, 170] [0, 91, 174, 205, 279]
1 [181, 175, 172, 165, 150] [93, 171, 241, 273, 300]
我必须根据以下条件制作 4 列(0-90 天、91-180 天、181-270 天、271-360 天):
1) If there are multiple weights in a specific time duration, get the maximum weight in that time duration column.
2) If no weight is present in that time duration, the value for that duration would be 0.
期望的输出:
weight days_since_gym 0-90 91-180 181-270 271-360
0 [200, 190, 188, 180, 170] [0, 87, 174, 205, 279] 200 188 180 170
1 [181, 175, 172, 165, 150] [93, 171, 241, 273, 300] 0 181 172 165
最明智的做法是什么?任何建议,将不胜感激。谢谢!
您可以编写一个自定义函数来接收权重列表、开始日期、结束日期 — 然后逐行应用此函数以使用 pandas apply 函数创建每个新列.如果您以前没有使用过 apply,基本结构类似于:df.apply(lambda x: custom_function(...), axis=1)
。参数 axis=1 确保您的自定义函数按行应用。
由于新列的名称也是开始日期和结束日期,因此您可以遍历这些开始日期和结束日期范围。
我还注意到,在您的问题中,您创建的 DataFrame 与所需输出之间似乎存在一些不匹配,因此我将所需输出作为 DataFrame。
import numpy as np
import pandas as pd
df = pd.DataFrame({'weight': [[200, 190, 188, 180, 170],
[181, 175, 172, 165, 150]],
'days_since_gym': [[0, 87, 174, 205, 279],
[93, 171, 241, 273, 300]]})
def return_max_weight(weights, days, start_day, end_day):
## get the indices where weights are between start and end days
days = np.array(days)
weights_idx = list(np.where((days >= start_day) & (days <= end_day))[0])
if len(weights_idx) == 0:
return 0
else:
weight_between_start_and_end = [weights[idx] for idx in weights_idx]
return max(weight_between_start_and_end)
for start_day, end_day in zip([0, 91, 181, 271],[90, 180, 270, 360]):
col_name = f"{start_day}-{end_day}"
df[col_name] = df[['weight','days_since_gym']].apply(
lambda x: return_max_weight(x['weight'], x['days_since_gym'], start_day, end_day),
axis=1
)
输出:
>>> df
weight days_since_gym 0-90 91-180 181-270 271-360
0 [200, 190, 188, 180, 170] [0, 87, 174, 205, 279] 200 188 180 170
1 [181, 175, 172, 165, 150] [93, 171, 241, 273, 300] 0 181 172 165