改变数据的形状和结构
Changing Shape and Structure of Data
我有以下数据集
Office
Employee ID
Joining Date
Attrition Date
AA
700237
27-11-2017
AA
700238
11-01-2018
AA
700252
14-02-2018
08-04-2018
AB
700287
18-01-2014
AB
700449
28-02-2014
17-04-2014
我们的想法是,如果有人加入,则在活动列中添加,如果有人在任何一个月内辞职,则减去活动列,因此想使用 python
按以下格式更改它
Office
Month & Year
Active
AA
Jan-17
0
AA
Feb-17
0
AA
Mar-17
0
AA
Apr-17
0
AA
May-17
0
AA
Jun-17
0
AA
Jul-17
0
AA
Aug-17
0
AA
Sep-17
0
AA
Oct-17
0
AA
Nov-17
1
AA
Dec-17
1
AA
Jan-18
2
AA
Feb-18
3
AA
Mar-18
3
AA
Apr-18
2
AB
Jan-14
1
AB
Feb-14
2
AB
Mar-14
2
AB
Apr-14
1
请帮忙。
使用:
#convert columns to datetimes
df['Joining Date'] = pd.to_datetime(df['Joining Date'], dayfirst=True)
df['Attrition Date'] = pd.to_datetime(df['Attrition Date'], dayfirst=True)
#add new rows by first January of minimal year per groups
df1 = df.groupby('Office')['Joining Date'].min() - pd.offsets.DateOffset(month=1, day=1)
df = df.append(df1.reset_index()).sort_values(['Office','Joining Date'])
#replace missing values in Attrition Date by maximal date with next month
#replace missing values in Joining Date by maximal date with next month
next_month = (df.groupby('Office')['Attrition Date'].transform('max') +
pd.offsets.DateOffset(months=1))
next_month1 = (df.groupby('Office')['Joining Date'].transform('max') +
pd.offsets.DateOffset(months=1))
df['Attrition Date'] = df['Attrition Date'].fillna(next_month).fillna(next_month1)
#explode start and end datetimes converted to months with years
f = lambda x: pd.date_range(x['Joining Date'],
x['Attrition Date'], freq='M').strftime('%b-%y')
df['Month & Year'] = df.apply(f, axis=1)
#count number of Employee ID with omit missing values
df = (df.explode('Month & Year')
.groupby(['Office','Month & Year'], sort=False)['Employee ID']
.count()
.reset_index(name='Active'))
print (df)
Office Month & Year Active
0 AA Jan-17 0
1 AA Feb-17 0
2 AA Mar-17 0
3 AA Apr-17 0
4 AA May-17 0
5 AA Jun-17 0
6 AA Jul-17 0
7 AA Aug-17 0
8 AA Sep-17 0
9 AA Oct-17 0
10 AA Nov-17 1
11 AA Dec-17 1
12 AA Jan-18 2
13 AA Feb-18 3
14 AA Mar-18 3
15 AA Apr-18 2
16 AB Jan-14 1
17 AB Feb-14 2
18 AB Mar-14 2
19 AB Apr-14 1
我有以下数据集
Office | Employee ID | Joining Date | Attrition Date |
---|---|---|---|
AA | 700237 | 27-11-2017 | |
AA | 700238 | 11-01-2018 | |
AA | 700252 | 14-02-2018 | 08-04-2018 |
AB | 700287 | 18-01-2014 | |
AB | 700449 | 28-02-2014 | 17-04-2014 |
我们的想法是,如果有人加入,则在活动列中添加,如果有人在任何一个月内辞职,则减去活动列,因此想使用 python
按以下格式更改它Office | Month & Year | Active |
---|---|---|
AA | Jan-17 | 0 |
AA | Feb-17 | 0 |
AA | Mar-17 | 0 |
AA | Apr-17 | 0 |
AA | May-17 | 0 |
AA | Jun-17 | 0 |
AA | Jul-17 | 0 |
AA | Aug-17 | 0 |
AA | Sep-17 | 0 |
AA | Oct-17 | 0 |
AA | Nov-17 | 1 |
AA | Dec-17 | 1 |
AA | Jan-18 | 2 |
AA | Feb-18 | 3 |
AA | Mar-18 | 3 |
AA | Apr-18 | 2 |
AB | Jan-14 | 1 |
AB | Feb-14 | 2 |
AB | Mar-14 | 2 |
AB | Apr-14 | 1 |
请帮忙。
使用:
#convert columns to datetimes
df['Joining Date'] = pd.to_datetime(df['Joining Date'], dayfirst=True)
df['Attrition Date'] = pd.to_datetime(df['Attrition Date'], dayfirst=True)
#add new rows by first January of minimal year per groups
df1 = df.groupby('Office')['Joining Date'].min() - pd.offsets.DateOffset(month=1, day=1)
df = df.append(df1.reset_index()).sort_values(['Office','Joining Date'])
#replace missing values in Attrition Date by maximal date with next month
#replace missing values in Joining Date by maximal date with next month
next_month = (df.groupby('Office')['Attrition Date'].transform('max') +
pd.offsets.DateOffset(months=1))
next_month1 = (df.groupby('Office')['Joining Date'].transform('max') +
pd.offsets.DateOffset(months=1))
df['Attrition Date'] = df['Attrition Date'].fillna(next_month).fillna(next_month1)
#explode start and end datetimes converted to months with years
f = lambda x: pd.date_range(x['Joining Date'],
x['Attrition Date'], freq='M').strftime('%b-%y')
df['Month & Year'] = df.apply(f, axis=1)
#count number of Employee ID with omit missing values
df = (df.explode('Month & Year')
.groupby(['Office','Month & Year'], sort=False)['Employee ID']
.count()
.reset_index(name='Active'))
print (df)
Office Month & Year Active
0 AA Jan-17 0
1 AA Feb-17 0
2 AA Mar-17 0
3 AA Apr-17 0
4 AA May-17 0
5 AA Jun-17 0
6 AA Jul-17 0
7 AA Aug-17 0
8 AA Sep-17 0
9 AA Oct-17 0
10 AA Nov-17 1
11 AA Dec-17 1
12 AA Jan-18 2
13 AA Feb-18 3
14 AA Mar-18 3
15 AA Apr-18 2
16 AB Jan-14 1
17 AB Feb-14 2
18 AB Mar-14 2
19 AB Apr-14 1