通过 Python 按日期计算列值的范围
Calculating the range of column value datewise through Python
我想根据日期计算 product_mrp
的最大差异。
为此,我试图按日期分组,但在那之后无法得到。
输入:
+-------------+--------------------+
| product_mrp | order_date |
+-------------+--------------------+
| 142 | 01-12-2019 |
| 20 | 01-12-2019 |
| 20 | 01-12-2019 |
| 120 | 01-12-2019 |
| 30 | 03-12-2019 |
| 20 | 03-12-2019 |
| 45 | 03-12-2019 |
| 215 | 03-12-2019 |
| 15 | 03-12-2019 |
| 25 | 07-12-2019 |
| 5 | 07-12-2019 |
+-------------+--------------------+
预期输出:
+-------------+--------------------+
| product_mrp | order_date |
+-------------+--------------------+
| 122 | 01-12-2019 |
| 200 | 03-12-2019 |
| 20 | 07-12-2019 |
+-------------+--------------------+
使用pandas
加载数据,然后使用groupby
按共享索引分组:
import pandas as pd
dates = ['01-12-2019']*4 + ['03-12-2019']*5 + ['07-12-2019']*2
data = [142,20,20,120,30,20,45,215,15,25,5]
df = pd.DataFrame(data,)
df.index = pd.DatetimeIndex(dates)
grouped = df.groupby(df.index).apply(lambda x: x.max()-x.min())
输出:
product mrp
2019-01-12 122
2019-03-12 200
2019-07-12 20
你可以像你说的那样使用 groupby
和 max
、min
和 reset_index
比如:
gr = df.groupby('order_date')['product_mrp']
df_ = (gr.max()-gr.min()).reset_index()
print (df_)
order_date product_mrp
0 01-12-2019 122
1 03-12-2019 200
2 07-12-2019 20
我想根据日期计算 product_mrp
的最大差异。
为此,我试图按日期分组,但在那之后无法得到。
输入:
+-------------+--------------------+
| product_mrp | order_date |
+-------------+--------------------+
| 142 | 01-12-2019 |
| 20 | 01-12-2019 |
| 20 | 01-12-2019 |
| 120 | 01-12-2019 |
| 30 | 03-12-2019 |
| 20 | 03-12-2019 |
| 45 | 03-12-2019 |
| 215 | 03-12-2019 |
| 15 | 03-12-2019 |
| 25 | 07-12-2019 |
| 5 | 07-12-2019 |
+-------------+--------------------+
预期输出:
+-------------+--------------------+
| product_mrp | order_date |
+-------------+--------------------+
| 122 | 01-12-2019 |
| 200 | 03-12-2019 |
| 20 | 07-12-2019 |
+-------------+--------------------+
使用pandas
加载数据,然后使用groupby
按共享索引分组:
import pandas as pd
dates = ['01-12-2019']*4 + ['03-12-2019']*5 + ['07-12-2019']*2
data = [142,20,20,120,30,20,45,215,15,25,5]
df = pd.DataFrame(data,)
df.index = pd.DatetimeIndex(dates)
grouped = df.groupby(df.index).apply(lambda x: x.max()-x.min())
输出:
product mrp
2019-01-12 122
2019-03-12 200
2019-07-12 20
你可以像你说的那样使用 groupby
和 max
、min
和 reset_index
比如:
gr = df.groupby('order_date')['product_mrp']
df_ = (gr.max()-gr.min()).reset_index()
print (df_)
order_date product_mrp
0 01-12-2019 122
1 03-12-2019 200
2 07-12-2019 20