考虑 Pandas Dataframe 中的组,在列上显示下一个值
Displaying next value on a column considering groups in Pandas Dataframe
我有这个示例数据框,我需要显示特定客户区域组的下一个交付日期。
日期可以编码为字符串或日期时间,我在这个例子中使用的是字符串。
# Import pandas library
import pandas as pd
import numpy as np
data = [['NY', 'A','2020-01-01', 10], ['NY', 'A','2020-02-03', 20], ['NY', 'A','2020-04-05', 30], ['NY', 'A','2020-05-05', 25],
['NY', 'B','2020-01-01', 15], ['NY', 'B','2020-02-02', 10], ['NY', 'B','2020-02-10', 20],
['FL', 'A','2020-01-01', 15], ['FL', 'A','2020-02-01', 10], ['FL', 'A','2020-03-01', 12], ['FL', 'A','2020-04-01', 25], ['FL', 'A','2020-05-01', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Region', 'Client', 'deliveryDate', 'price'])
# print dataframe.
df
Region Client deliveryDate price
0 NY A 2020-01-01 10
1 NY A 2020-02-03 20
2 NY A 2020-04-05 30
3 NY A 2020-05-05 25
4 NY B 2020-01-01 15
5 NY B 2020-02-02 10
6 NY B 2020-02-10 20
7 FL A 2020-01-01 15
8 FL A 2020-02-01 10
9 FL A 2020-03-01 12
10 FL A 2020-04-01 25
11 FL A 2020-05-01 20
期望的输出:
data2 = [['NY', 'A','2020-01-01', '2020-02-03', 10], ['NY', 'A','2020-02-03', '2020-04-05', 20], ['NY', 'A','2020-04-05', '2020-05-05', 30], ['NY', 'A','2020-05-05', float('nan'), 25],
['NY', 'B','2020-01-01', '2020-02-02', 15], ['NY', 'B','2020-02-02','2020-02-10', 10], ['NY', 'B','2020-02-10', float('nan'), 20],
['FL', 'A','2020-01-01', '2020-02-01', 15], ['FL', 'A','2020-02-01', '2020-03-01', 10], ['FL', 'A','2020-03-01', '2020-04-01', 12], ['FL', 'A','2020-04-01', '2020-05-01', 25], ['FL', 'A','2020-05-01', float('nan'), 20]
]
# Create the pandas DataFrame
df2 = pd.DataFrame(data2, columns = ['Region', 'Client', 'deliveryDate', 'nextDelivery', 'price'])
Region Client deliveryDate nextDelivery price
0 NY A 2020-01-01 2020-02-03 10
1 NY A 2020-02-03 2020-04-05 20
2 NY A 2020-04-05 2020-05-05 30
3 NY A 2020-05-05 NaN 25
4 NY B 2020-01-01 2020-02-02 15
5 NY B 2020-02-02 2020-02-10 10
6 NY B 2020-02-10 NaN 20
7 FL A 2020-01-01 2020-02-01 15
8 FL A 2020-02-01 2020-03-01 10
9 FL A 2020-03-01 2020-04-01 12
10 FL A 2020-04-01 2020-05-01 25
11 FL A 2020-05-01 NaN 20
提前致谢。
假设交货日期已排序,如何按地区和客户分组,然后应用 shift
?
df['nextDelivery'] = df.groupby(['Region','Client']).shift(-1)['deliveryDate']
输出:
Region Client deliveryDate price nextDelivery
0 NY A 2020-01-01 10 2020-02-03
1 NY A 2020-02-03 20 2020-04-05
2 NY A 2020-04-05 30 2020-05-05
3 NY A 2020-05-05 25 NaN
4 NY B 2020-01-01 15 2020-02-02
5 NY B 2020-02-02 10 2020-02-10
6 NY B 2020-02-10 20 NaN
7 FL A 2020-01-01 15 2020-02-01
8 FL A 2020-02-01 10 2020-03-01
9 FL A 2020-03-01 12 2020-04-01
10 FL A 2020-04-01 25 2020-05-01
11 FL A 2020-05-01 20 NaN
我有这个示例数据框,我需要显示特定客户区域组的下一个交付日期。 日期可以编码为字符串或日期时间,我在这个例子中使用的是字符串。
# Import pandas library
import pandas as pd
import numpy as np
data = [['NY', 'A','2020-01-01', 10], ['NY', 'A','2020-02-03', 20], ['NY', 'A','2020-04-05', 30], ['NY', 'A','2020-05-05', 25],
['NY', 'B','2020-01-01', 15], ['NY', 'B','2020-02-02', 10], ['NY', 'B','2020-02-10', 20],
['FL', 'A','2020-01-01', 15], ['FL', 'A','2020-02-01', 10], ['FL', 'A','2020-03-01', 12], ['FL', 'A','2020-04-01', 25], ['FL', 'A','2020-05-01', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Region', 'Client', 'deliveryDate', 'price'])
# print dataframe.
df
Region Client deliveryDate price
0 NY A 2020-01-01 10
1 NY A 2020-02-03 20
2 NY A 2020-04-05 30
3 NY A 2020-05-05 25
4 NY B 2020-01-01 15
5 NY B 2020-02-02 10
6 NY B 2020-02-10 20
7 FL A 2020-01-01 15
8 FL A 2020-02-01 10
9 FL A 2020-03-01 12
10 FL A 2020-04-01 25
11 FL A 2020-05-01 20
期望的输出:
data2 = [['NY', 'A','2020-01-01', '2020-02-03', 10], ['NY', 'A','2020-02-03', '2020-04-05', 20], ['NY', 'A','2020-04-05', '2020-05-05', 30], ['NY', 'A','2020-05-05', float('nan'), 25],
['NY', 'B','2020-01-01', '2020-02-02', 15], ['NY', 'B','2020-02-02','2020-02-10', 10], ['NY', 'B','2020-02-10', float('nan'), 20],
['FL', 'A','2020-01-01', '2020-02-01', 15], ['FL', 'A','2020-02-01', '2020-03-01', 10], ['FL', 'A','2020-03-01', '2020-04-01', 12], ['FL', 'A','2020-04-01', '2020-05-01', 25], ['FL', 'A','2020-05-01', float('nan'), 20]
]
# Create the pandas DataFrame
df2 = pd.DataFrame(data2, columns = ['Region', 'Client', 'deliveryDate', 'nextDelivery', 'price'])
Region Client deliveryDate nextDelivery price
0 NY A 2020-01-01 2020-02-03 10
1 NY A 2020-02-03 2020-04-05 20
2 NY A 2020-04-05 2020-05-05 30
3 NY A 2020-05-05 NaN 25
4 NY B 2020-01-01 2020-02-02 15
5 NY B 2020-02-02 2020-02-10 10
6 NY B 2020-02-10 NaN 20
7 FL A 2020-01-01 2020-02-01 15
8 FL A 2020-02-01 2020-03-01 10
9 FL A 2020-03-01 2020-04-01 12
10 FL A 2020-04-01 2020-05-01 25
11 FL A 2020-05-01 NaN 20
提前致谢。
假设交货日期已排序,如何按地区和客户分组,然后应用 shift
?
df['nextDelivery'] = df.groupby(['Region','Client']).shift(-1)['deliveryDate']
输出:
Region Client deliveryDate price nextDelivery
0 NY A 2020-01-01 10 2020-02-03
1 NY A 2020-02-03 20 2020-04-05
2 NY A 2020-04-05 30 2020-05-05
3 NY A 2020-05-05 25 NaN
4 NY B 2020-01-01 15 2020-02-02
5 NY B 2020-02-02 10 2020-02-10
6 NY B 2020-02-10 20 NaN
7 FL A 2020-01-01 15 2020-02-01
8 FL A 2020-02-01 10 2020-03-01
9 FL A 2020-03-01 12 2020-04-01
10 FL A 2020-04-01 25 2020-05-01
11 FL A 2020-05-01 20 NaN