Pandas 根据多列排名
Pandas Ranking based on multiple columns
我正在尝试根据多个列按升序排列数据。
请查看我正在处理的以下数据框:
{'FACILITY': ['AAA', 'AAA', 'AAA', 'AAA', 'AAA'],
'IN_DATE':
['2015-08-30 05:49:05',
'2015-08-30 05:49:05',
'2015-08-30 05:49:05',
'2015-08-30 05:49:05',
'2015-09-02 20:56:59'],
'LOT':
['N123456', 'N654321', 'N654321', 'N123456', 'N123456'],
'OPERATION':
['100', '100', '100', '100', '100'],
'TXN_DATE':
['2015-08-30 06:04:03',
'2015-08-30 05:59:57',
'2015-08-30 06:37:32',
'2015-08-30 06:30:01',
'2015-09-02 21:39:44']
我正在尝试根据手数中的顺序创建新列 "ORDER",并根据 TXN_DATE 按升序操作创建新列。
您可以使用rank方法获取排序顺序:
In [11]: df
Out[11]:
FACILITY IN_DATE LOT OPERATION TXN_DATE
0 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:04:03
1 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 05:59:57
2 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:37:32
3 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:30:01
4 AAA 2015-09-02 20:56:59 N123456 100 2015-09-02 21:39:44
In [12]: df["TXN_DATE"].rank()
Out[12]:
0 2
1 1
2 4
3 3
4 5
Name: TXN_DATE, dtype: float64
作为专栏:
In [13]: df["ORDER"] = df["TXN_DATE"].rank()
In [14]: df
Out[14]:
FACILITY IN_DATE LOT OPERATION TXN_DATE ORDER
0 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:04:03 2
1 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 05:59:57 1
2 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:37:32 4
3 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:30:01 3
4 AAA 2015-09-02 20:56:59 N123456 100 2015-09-02 21:39:44 5
Rank也是Series的groupby方法:
In [15]: df.groupby(["LOT", "OPERATION"])["TXN_DATE"].rank()
Out[15]:
0 2
1 1
2 4
3 3
4 5
Name: (N123456, 100), dtype: float64
注意:在这个小例子中,名称来自唯一的组,通常这不会有名称。
我正在尝试根据多个列按升序排列数据。
请查看我正在处理的以下数据框:
{'FACILITY': ['AAA', 'AAA', 'AAA', 'AAA', 'AAA'],
'IN_DATE':
['2015-08-30 05:49:05',
'2015-08-30 05:49:05',
'2015-08-30 05:49:05',
'2015-08-30 05:49:05',
'2015-09-02 20:56:59'],
'LOT':
['N123456', 'N654321', 'N654321', 'N123456', 'N123456'],
'OPERATION':
['100', '100', '100', '100', '100'],
'TXN_DATE':
['2015-08-30 06:04:03',
'2015-08-30 05:59:57',
'2015-08-30 06:37:32',
'2015-08-30 06:30:01',
'2015-09-02 21:39:44']
我正在尝试根据手数中的顺序创建新列 "ORDER",并根据 TXN_DATE 按升序操作创建新列。
您可以使用rank方法获取排序顺序:
In [11]: df
Out[11]:
FACILITY IN_DATE LOT OPERATION TXN_DATE
0 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:04:03
1 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 05:59:57
2 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:37:32
3 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:30:01
4 AAA 2015-09-02 20:56:59 N123456 100 2015-09-02 21:39:44
In [12]: df["TXN_DATE"].rank()
Out[12]:
0 2
1 1
2 4
3 3
4 5
Name: TXN_DATE, dtype: float64
作为专栏:
In [13]: df["ORDER"] = df["TXN_DATE"].rank()
In [14]: df
Out[14]:
FACILITY IN_DATE LOT OPERATION TXN_DATE ORDER
0 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:04:03 2
1 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 05:59:57 1
2 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:37:32 4
3 AAA 2015-08-30 05:49:05 N123456 100 2015-08-30 06:30:01 3
4 AAA 2015-09-02 20:56:59 N123456 100 2015-09-02 21:39:44 5
Rank也是Series的groupby方法:
In [15]: df.groupby(["LOT", "OPERATION"])["TXN_DATE"].rank()
Out[15]:
0 2
1 1
2 4
3 3
4 5
Name: (N123456, 100), dtype: float64
注意:在这个小例子中,名称来自唯一的组,通常这不会有名称。