如何拆分 python 中的年月列?
How to split year and month column in python?
如何拆分数据如下?
示例数据:
EmployeeId City join_month Year
0 001 Mumbai 1 2018
1 001 Bangalore 3 2018
2 002 Pune 2 2019
3 002 Mumbai 6 2017
4 003 Delhi 9 2018
5 003 Mumbai 12 2019
6 004 Bangalore 11 2017
7 004 Pune 10 2018
8 005 Mumbai 5 2017
要求的输出应该是-
EmployeeId City join_month Year 2018_jan_count 2018_feb_count 2018_march_count
0 001 Mumbai 1 2018
1 001 Bangalore 3 2018
2 002 Pune 2 2019
3 002 Mumbai 6 2017
4 003 Delhi 9 2018
5 003 Mumbai 12 2019
6 004 Bangalore 11 2017
7 004 Pune 10 2018
8 005 Mumbai 5 2017
您可以使用df.apply
df.apply(pd.value_counts)
这将应用基于列的聚合函数(在本例中为日期)
我构建年月值,然后将其用作数据透视表 table 上的列。我统计的是按城市和年月汇总的员工id
months=[(1,'Jan'),(2,'Feb'),(3,'Mar'),(4,'Apr'),(5,'May'),(6,'Jun'),(7,'Jul'),(8,'Aug'),(9,'Sept'),(10,'Oct'),(11,'Nov'),(12,'Dec')]
employeeId=['001','001','002','002','003','003','004','004','005']
city=['Mumbai', 'Bangalore','Pune','Mumbai','Delhi','Mumbai','Bangalore','Pune','Mumbai']
join_month=[1,3,2,6,9,12,11,10,1]
char_month=[b for item in join_month for a,b in months if item==a ]
year=[2018, 2018,2019,2017,2018,2017,2017,2018,2018]
char_yearmonth=[]
[char_yearmonth.append(str(year[i])+"_"+char_month[i]) for i in range(len(year))]
df=pd.DataFrame({'EmployeeId': employeeId,'City':city,'YearMonth':char_yearmonth})
fp=df.pivot_table(index=['City'], columns=['YearMonth'],aggfunc='count').fillna(0)
print(fp)
EmployeeId \
YearMonth 2017_Dec 2017_Jun 2017_Nov 2018_Jan 2018_Mar 2018_Oct 2018_Sept
City
Bangalore 0.0 0.0 1.0 0.0 1.0 0.0 0.0
Delhi 0.0 0.0 0.0 0.0 0.0 0.0 1.0
Mumbai 1.0 1.0 0.0 2.0 0.0 0.0 0.0
Pune 0.0 0.0 0.0 0.0 0.0 1.0 0.0
YearMonth 2019_Feb
City
Bangalore 0.0
Delhi 0.0
Mumbai 0.0
Pune 1.0
如何拆分数据如下?
示例数据:
EmployeeId City join_month Year
0 001 Mumbai 1 2018
1 001 Bangalore 3 2018
2 002 Pune 2 2019
3 002 Mumbai 6 2017
4 003 Delhi 9 2018
5 003 Mumbai 12 2019
6 004 Bangalore 11 2017
7 004 Pune 10 2018
8 005 Mumbai 5 2017
要求的输出应该是-
EmployeeId City join_month Year 2018_jan_count 2018_feb_count 2018_march_count
0 001 Mumbai 1 2018
1 001 Bangalore 3 2018
2 002 Pune 2 2019
3 002 Mumbai 6 2017
4 003 Delhi 9 2018
5 003 Mumbai 12 2019
6 004 Bangalore 11 2017
7 004 Pune 10 2018
8 005 Mumbai 5 2017
您可以使用df.apply
df.apply(pd.value_counts)
这将应用基于列的聚合函数(在本例中为日期)
我构建年月值,然后将其用作数据透视表 table 上的列。我统计的是按城市和年月汇总的员工id
months=[(1,'Jan'),(2,'Feb'),(3,'Mar'),(4,'Apr'),(5,'May'),(6,'Jun'),(7,'Jul'),(8,'Aug'),(9,'Sept'),(10,'Oct'),(11,'Nov'),(12,'Dec')]
employeeId=['001','001','002','002','003','003','004','004','005']
city=['Mumbai', 'Bangalore','Pune','Mumbai','Delhi','Mumbai','Bangalore','Pune','Mumbai']
join_month=[1,3,2,6,9,12,11,10,1]
char_month=[b for item in join_month for a,b in months if item==a ]
year=[2018, 2018,2019,2017,2018,2017,2017,2018,2018]
char_yearmonth=[]
[char_yearmonth.append(str(year[i])+"_"+char_month[i]) for i in range(len(year))]
df=pd.DataFrame({'EmployeeId': employeeId,'City':city,'YearMonth':char_yearmonth})
fp=df.pivot_table(index=['City'], columns=['YearMonth'],aggfunc='count').fillna(0)
print(fp)
EmployeeId \
YearMonth 2017_Dec 2017_Jun 2017_Nov 2018_Jan 2018_Mar 2018_Oct 2018_Sept
City
Bangalore 0.0 0.0 1.0 0.0 1.0 0.0 0.0
Delhi 0.0 0.0 0.0 0.0 0.0 0.0 1.0
Mumbai 1.0 1.0 0.0 2.0 0.0 0.0 0.0
Pune 0.0 0.0 0.0 0.0 0.0 1.0 0.0
YearMonth 2019_Feb
City
Bangalore 0.0
Delhi 0.0
Mumbai 0.0
Pune 1.0