pandas 中多列月度、季度和年度级别的数据操作
Data manipulation in pandas on monthly, quarterly and annual level on multiple columns
我需要创建一个函数,它将输入作为字典并更新数据框中的列值。我的数据如下
Date
Col_1
Col_2
Col_3
Col_4
Col_5
01/01/2021
10
20
10
20
10
02/01/2021
10
20
10
20
10
03/01/2021
10
20
10
20
10
04/01/2021
10
20
10
20
10
05/01/2021
10
20
10
20
10
06/01/2021
10
20
10
20
10
07/01/2021
10
20
10
20
10
08/01/2021
10
20
10
20
10
09/01/2021
10
20
10
20
10
10/01/2021
10
20
10
20
10
11/01/2021
10
20
10
20
10
12/01/2021
10
20
10
20
10
现在,如果通过 'Col_1' 和 'Col_2' 的每月级别更新百分比,比如
{Date: ['01/01/2021','02/01/2021','03/01/2021','04/01/2021','05/01/2021','06/01/2021',
'07/01/2021','08/01/2021','09/01/2021','10/01/2021','11/01/2021','12/01/2021',],
'Col_1': [20,20,20,20,30,30,40,40,20,20,20,20],
'Col_2': [0,0,0,0,0,0,0,0,0,0,10,10]}
执行此操作后,我想要的每月更改如下所示
Date
Col_1
Col_2
Col_3
Col_4
Col_5
01/01/2021
12
20
10
20
10
02/01/2021
12
20
10
20
10
03/01/2021
12
20
10
20
10
04/01/2021
12
20
10
20
10
05/01/2021
13
20
10
20
10
06/01/2021
13
20
10
20
10
07/01/2021
14
20
10
20
10
08/01/2021
14
20
10
20
10
09/01/2021
12
20
10
20
10
10/01/2021
12
20
10
20
10
11/01/2021
12
24
10
20
10
12/01/2021
12
24
10
20
10
同样,我也想更新季度和年度级别的数据。我能够进行年度更新,这是我的代码。请帮助我根据输入进行每月和每季度的更新。
谢谢!!
dic = {'col_1':10,'col_2':-5)
year = 2021
def update_df(dic,df,year):
df = df[df['date'].dt.year == year]
df = (df+df.select_dtypes(include = 'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
return df
我正在尝试这样
def update_df(dic,df,year,choice):
if choice == annual:
df = df[df['date'].dt.year == year]
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
elif choice == quarterly :
df["quarter"] = df.date.dt.quarter
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
else choice == monthly :
df["month"] = df.date.dt.month
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
return df
当然可能有更简洁的方法,但下面的方法将起作用并提供一个函数来进行年度、季度或月度更新,如下所示:
import pandas as pd
from collections import namedtuple
# Control tuple defining the date parameters for changing dataframe
DateControl = namedtuple('DateControl', ['Year', 'Quarter', 'Month'])
def updateFrame(df:pd.DataFrame, pcnt_val: float, **args) -> pd.DataFrame:
# Function to update a specified year, quarter of Month by pcnt_val amount
dtecol = args.pop('DTECOL', None)
colList = args.pop('Columns', [])
control = DateControl(args.pop('Year', None),
args.pop('Quarter', None),
args.pop('Month', None)
)
def EvalDate(ds: pd.Series, row: int, selection: DateControl) -> bool:
# Evaluate the truth of a date based on control arguments
yr = False
qtr = False
mnth = False
if selection.Year is None:
yr = True
else:
if ds[row].year == selection.Year:
yr = True
if selection.Quarter is None:
qtr = True
else:
if ds[row].quarter == selection.Quarter:
qtr = True
if selection.Month is None:
mnth = True
else:
if ds[row].month == selection.Month:
mnth = True
return yr and qtr and mnth
# Use control to update all cols named in colList
mask = list(EvalDate(df[dtecol], x, control) for x in range(len(df[dtecol])))
mod = list((1.0 + pcnt_val) if x else 1.0 for x in mask)
print(mask)
print(mod)
for c in colList:
df[c] = list(df.iloc[x][c] * mod[x] for x in range(len(df[c])))
return df
updateFrame 函数有两个位置参数:
. df - 要更新的数据框
. pcnt_val - 要添加到当前值的百分比
该函数还需要一些关键字变量来包括:
- DTECOL - 这是包含日期的 df 列的名称
- 列 - Df 中要更改的列标题列表
- Year - 年份值或 None 如果要更改所有年份
- 季度 - 特定的季度整数 1 到 4(含)或 None
- 月份 - 要更改的特定月份或 None
将此函数应用于您的数据框 df,如下所示:
dg = updateFrame(df, .25, DTECOL='Date', Columns=['Col_1', 'Col_2'], Year=2021, Quarter=3)
产量:
Date Col_1 Col_2 Col_3 Col_4 Col_5
0 2021-01-01 10.0 20.0 10 20 10
1 2021-02-01 10.0 20.0 10 20 10
2 2021-03-01 10.0 20.0 10 20 10
3 2021-04-01 10.0 20.0 10 20 10
4 2021-05-01 10.0 20.0 10 20 10
5 2021-06-01 10.0 20.0 10 20 10
6 2021-07-01 12.5 25.0 10 20 10
7 2021-08-01 12.5 25.0 10 20 10
8 2021-09-01 12.5 25.0 10 20 10
9 2021-10-01 10.0 20.0 10 20 10
10 2021-11-01 10.0 20.0 10 20 10
11 2021-12-01 10.0 20.0 10 20 10
鉴于您希望在一个电话中提供所有 4 个季度的更新,我会这样做:
添加新功能:
def updateByQuarter(df:pd.DataFrame, changes: list, **args) -> pd.DataFrame:
# Given a quarterly change list of the form tuple(qtrid, chgval) Update the dataframe
for chg in changes:
args['Quarter'] = chg[0]
df updateFrame(df, chg[1], **args)
return df
然后按季度创建变更列表
# List of tuples defining the quarter and percent change
qtrChg = [(1, 0.02),(2, 0.035),(3, -0.018),(4, 0.125)]
用途:
df = updateByQuarter(df, [(1, 0.02), (2, 0.04), (3, -0.02), (4, 0.15)], DTECOL='Date', Columns=['Col_1', 'Col_2'])
这产生:
Date Col_1 Col_2 Col_3 Col_4 Col_5
0 2021-01-01 10.2 20.4 10 20 10
1 2021-02-01 10.2 20.4 10 20 10
2 2021-03-01 10.2 20.4 10 20 10
3 2021-04-01 10.4 20.8 10 20 10
4 2021-05-01 10.4 20.8 10 20 10
5 2021-06-01 10.4 20.8 10 20 10
6 2021-07-01 9.8 19.6 10 20 10
7 2021-08-01 9.8 19.6 10 20 10
8 2021-09-01 9.8 19.6 10 20 10
9 2021-10-01 11.5 23.0 10 20 10
10 2021-11-01 11.5 23.0 10 20 10
11 2021-12-01 11.5 23.0 10 20 10
**pandas 捕获了 4 个与时间相关的一般概念:
日期时间:支持时区的特定日期和时间。类似于标准库中的 datetime.datetime。
时间增量:绝对持续时间。类似于标准库中的 datetime.timedelta。
时间跨度:由时间点及其相关频率定义的时间跨度。
日期偏移量:尊重日历算法的相对时间持续时间。类似于 dateutil 包中的 dateutil.relativedelta.relativedelta。**
我需要创建一个函数,它将输入作为字典并更新数据框中的列值。我的数据如下
Date | Col_1 | Col_2 | Col_3 | Col_4 | Col_5 |
---|---|---|---|---|---|
01/01/2021 | 10 | 20 | 10 | 20 | 10 |
02/01/2021 | 10 | 20 | 10 | 20 | 10 |
03/01/2021 | 10 | 20 | 10 | 20 | 10 |
04/01/2021 | 10 | 20 | 10 | 20 | 10 |
05/01/2021 | 10 | 20 | 10 | 20 | 10 |
06/01/2021 | 10 | 20 | 10 | 20 | 10 |
07/01/2021 | 10 | 20 | 10 | 20 | 10 |
08/01/2021 | 10 | 20 | 10 | 20 | 10 |
09/01/2021 | 10 | 20 | 10 | 20 | 10 |
10/01/2021 | 10 | 20 | 10 | 20 | 10 |
11/01/2021 | 10 | 20 | 10 | 20 | 10 |
12/01/2021 | 10 | 20 | 10 | 20 | 10 |
现在,如果通过 'Col_1' 和 'Col_2' 的每月级别更新百分比,比如
{Date: ['01/01/2021','02/01/2021','03/01/2021','04/01/2021','05/01/2021','06/01/2021',
'07/01/2021','08/01/2021','09/01/2021','10/01/2021','11/01/2021','12/01/2021',],
'Col_1': [20,20,20,20,30,30,40,40,20,20,20,20],
'Col_2': [0,0,0,0,0,0,0,0,0,0,10,10]}
执行此操作后,我想要的每月更改如下所示
Date | Col_1 | Col_2 | Col_3 | Col_4 | Col_5 |
---|---|---|---|---|---|
01/01/2021 | 12 | 20 | 10 | 20 | 10 |
02/01/2021 | 12 | 20 | 10 | 20 | 10 |
03/01/2021 | 12 | 20 | 10 | 20 | 10 |
04/01/2021 | 12 | 20 | 10 | 20 | 10 |
05/01/2021 | 13 | 20 | 10 | 20 | 10 |
06/01/2021 | 13 | 20 | 10 | 20 | 10 |
07/01/2021 | 14 | 20 | 10 | 20 | 10 |
08/01/2021 | 14 | 20 | 10 | 20 | 10 |
09/01/2021 | 12 | 20 | 10 | 20 | 10 |
10/01/2021 | 12 | 20 | 10 | 20 | 10 |
11/01/2021 | 12 | 24 | 10 | 20 | 10 |
12/01/2021 | 12 | 24 | 10 | 20 | 10 |
同样,我也想更新季度和年度级别的数据。我能够进行年度更新,这是我的代码。请帮助我根据输入进行每月和每季度的更新。
谢谢!!
dic = {'col_1':10,'col_2':-5)
year = 2021
def update_df(dic,df,year):
df = df[df['date'].dt.year == year]
df = (df+df.select_dtypes(include = 'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
return df
我正在尝试这样
def update_df(dic,df,year,choice):
if choice == annual:
df = df[df['date'].dt.year == year]
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
elif choice == quarterly :
df["quarter"] = df.date.dt.quarter
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
else choice == monthly :
df["month"] = df.date.dt.month
df = (df+df.select_dtypes(include =
'number').mul(pd.Series(dic)/100)).combine_first(df)[df.columns]
return df
当然可能有更简洁的方法,但下面的方法将起作用并提供一个函数来进行年度、季度或月度更新,如下所示:
import pandas as pd
from collections import namedtuple
# Control tuple defining the date parameters for changing dataframe
DateControl = namedtuple('DateControl', ['Year', 'Quarter', 'Month'])
def updateFrame(df:pd.DataFrame, pcnt_val: float, **args) -> pd.DataFrame:
# Function to update a specified year, quarter of Month by pcnt_val amount
dtecol = args.pop('DTECOL', None)
colList = args.pop('Columns', [])
control = DateControl(args.pop('Year', None),
args.pop('Quarter', None),
args.pop('Month', None)
)
def EvalDate(ds: pd.Series, row: int, selection: DateControl) -> bool:
# Evaluate the truth of a date based on control arguments
yr = False
qtr = False
mnth = False
if selection.Year is None:
yr = True
else:
if ds[row].year == selection.Year:
yr = True
if selection.Quarter is None:
qtr = True
else:
if ds[row].quarter == selection.Quarter:
qtr = True
if selection.Month is None:
mnth = True
else:
if ds[row].month == selection.Month:
mnth = True
return yr and qtr and mnth
# Use control to update all cols named in colList
mask = list(EvalDate(df[dtecol], x, control) for x in range(len(df[dtecol])))
mod = list((1.0 + pcnt_val) if x else 1.0 for x in mask)
print(mask)
print(mod)
for c in colList:
df[c] = list(df.iloc[x][c] * mod[x] for x in range(len(df[c])))
return df
updateFrame 函数有两个位置参数:
. df - 要更新的数据框
. pcnt_val - 要添加到当前值的百分比
该函数还需要一些关键字变量来包括:
- DTECOL - 这是包含日期的 df 列的名称
- 列 - Df 中要更改的列标题列表
- Year - 年份值或 None 如果要更改所有年份
- 季度 - 特定的季度整数 1 到 4(含)或 None
- 月份 - 要更改的特定月份或 None
将此函数应用于您的数据框 df,如下所示:
dg = updateFrame(df, .25, DTECOL='Date', Columns=['Col_1', 'Col_2'], Year=2021, Quarter=3)
产量:
Date Col_1 Col_2 Col_3 Col_4 Col_5
0 2021-01-01 10.0 20.0 10 20 10
1 2021-02-01 10.0 20.0 10 20 10
2 2021-03-01 10.0 20.0 10 20 10
3 2021-04-01 10.0 20.0 10 20 10
4 2021-05-01 10.0 20.0 10 20 10
5 2021-06-01 10.0 20.0 10 20 10
6 2021-07-01 12.5 25.0 10 20 10
7 2021-08-01 12.5 25.0 10 20 10
8 2021-09-01 12.5 25.0 10 20 10
9 2021-10-01 10.0 20.0 10 20 10
10 2021-11-01 10.0 20.0 10 20 10
11 2021-12-01 10.0 20.0 10 20 10
鉴于您希望在一个电话中提供所有 4 个季度的更新,我会这样做: 添加新功能:
def updateByQuarter(df:pd.DataFrame, changes: list, **args) -> pd.DataFrame:
# Given a quarterly change list of the form tuple(qtrid, chgval) Update the dataframe
for chg in changes:
args['Quarter'] = chg[0]
df updateFrame(df, chg[1], **args)
return df
然后按季度创建变更列表
# List of tuples defining the quarter and percent change
qtrChg = [(1, 0.02),(2, 0.035),(3, -0.018),(4, 0.125)]
用途:
df = updateByQuarter(df, [(1, 0.02), (2, 0.04), (3, -0.02), (4, 0.15)], DTECOL='Date', Columns=['Col_1', 'Col_2'])
这产生:
Date Col_1 Col_2 Col_3 Col_4 Col_5
0 2021-01-01 10.2 20.4 10 20 10
1 2021-02-01 10.2 20.4 10 20 10
2 2021-03-01 10.2 20.4 10 20 10
3 2021-04-01 10.4 20.8 10 20 10
4 2021-05-01 10.4 20.8 10 20 10
5 2021-06-01 10.4 20.8 10 20 10
6 2021-07-01 9.8 19.6 10 20 10
7 2021-08-01 9.8 19.6 10 20 10
8 2021-09-01 9.8 19.6 10 20 10
9 2021-10-01 11.5 23.0 10 20 10
10 2021-11-01 11.5 23.0 10 20 10
11 2021-12-01 11.5 23.0 10 20 10
**pandas 捕获了 4 个与时间相关的一般概念:
日期时间:支持时区的特定日期和时间。类似于标准库中的 datetime.datetime。
时间增量:绝对持续时间。类似于标准库中的 datetime.timedelta。
时间跨度:由时间点及其相关频率定义的时间跨度。
日期偏移量:尊重日历算法的相对时间持续时间。类似于 dateutil 包中的 dateutil.relativedelta.relativedelta。**