按日期对我的数据框进行排序(d/m/y + 小时:分钟:秒)
Sorting my data frame by date (d/m/y + hour: min: sec)
我正在尝试根据日期(d/m/y + 小时: 分钟: 秒)对列的值进行排序。下面我将向您展示给定数据的格式示例:
发起人
价格
日期
XXX
560
13/05/202011:05:35
Glovo 应用程序
250
12/05/2020 13:07:15
Glovo 应用程序
250
13/04/2020 12:09:25
expected output:
if the user selects a date from the 10/04/2020 | 00:00:00 to 15/05/2020 |00:00:00 :
Glovoapp: 500
XXX: 560
if the user selects a date from the 10/04/2020 00:00:00 to 01/05/2020 00:00:00:
Glovoapp: 250
到目前为止,我能够在没有日期过滤的情况下根据发起者计算价格总和。对我应该做什么有什么建议吗?
def sum_method(self):
montant_init = self.data.groupby("Initiateur")["Montant (centimes)"].sum()
print(montant_init)
return montant_init
^我就是用这个方法计算的。我希望我已经足够清楚了,谢谢。
试过答案;请指正:
class evaluation():
def __init__(self, df):
self.df = df
# Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes.
def in_range(actual, start, stop):
return start <= actual <= stop
def evaluate(self):
user_start = input("Enter your start date (dd.mm.yyyy hour:min:second): ")
user_stop = input("Enter your end date (dd.mm.yyyy hour:min:second): ")
# creates series of True or False selecting proper rows.
mask = self.df['Date'].apply(self.in_range, args=(user_start, user_stop))
# Do the groupby and sum on only those rows.
montant_init = self.df.loc[mask].groupby("Initiateur")["Montant (centimes)"].sum()
print(montant_init)
output when printing: self.df.loc[mask]
Empty DataFrame
Columns: [Opération, Initiateur, Montant (centimes), Monnaie, Date, Résultat, Compte marchand, Adresse IP Acheteur, Marque de carte]
Index: []
下面的作品。有两个步骤:
- 为 select 右边的行制作遮罩
- 然后只对这些行进行分组和求和
遮罩功能:
# Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes.
def in_range(actual, start, stop):
return start <= actual <= stop
然后应用蒙版并执行groupby:
# creates series of True or False selecting proper rows.
mask = df['date'].apply(in_range, args=(user_start, user_stop))
# Do the groupby and sum on only those rows.
df2 = df.loc[mask].groupby('Initiator').sum()
请注意,user_start
和 user_stop
应该是用户定义的开始和停止日期时间。
大功告成!
更新:将方法作为 class 的一部分包含在内:
class evaluation():
def __init__(self, df):
self.df = df
# Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes. Add 'self' as arg in method.
def in_range(self, actual, start, stop):
return start <= actual <= stop
def evaluate(self):
user_start = pd.to_datetime(input("Enter your start date (yyyy.mm.dd hour:min:second): "))
user_stop = pd.to_datetime(input("Enter your end date (yyyy.mm.dd hour:min:second): "))
# creates series of True or False selecting proper rows.
mask = self.df['Date'].apply(self.in_range, args=(user_start, user_stop))
# Do the groupby and sum on only those rows.
amount_init = self.df.loc[mask].groupby("Initiator")["Price"].sum()
print(amount_init)
然后实例化一个新的对象class:
import pandas as pd
import dateutil.parser as dtp
import evaluation as eval # this is the class we just made
data = {
'Initiator': ['XXX', 'Glovoapp', 'Glovoapp'],
'Price': [560, 250, 250],
'Date': [dtp.parse('13/05/2020 11:05:35'), dtp.parse('12/05/2020 13:07:15'), dtp.parse('13/04/2020 12:09:25')]
}
df = pd.DataFrame(data)
eval_obj = eval.evaluation(df)
eval_obj.evaluate()
我正在尝试根据日期(d/m/y + 小时: 分钟: 秒)对列的值进行排序。下面我将向您展示给定数据的格式示例:
发起人 | 价格 | 日期 |
---|---|---|
XXX | 560 | 13/05/202011:05:35 |
Glovo 应用程序 | 250 | 12/05/2020 13:07:15 |
Glovo 应用程序 | 250 | 13/04/2020 12:09:25 |
expected output:
if the user selects a date from the 10/04/2020 | 00:00:00 to 15/05/2020 |00:00:00 :
Glovoapp: 500
XXX: 560
if the user selects a date from the 10/04/2020 00:00:00 to 01/05/2020 00:00:00:
Glovoapp: 250
到目前为止,我能够在没有日期过滤的情况下根据发起者计算价格总和。对我应该做什么有什么建议吗?
def sum_method(self):
montant_init = self.data.groupby("Initiateur")["Montant (centimes)"].sum()
print(montant_init)
return montant_init
^我就是用这个方法计算的。我希望我已经足够清楚了,谢谢。
试过答案;请指正:
class evaluation():
def __init__(self, df):
self.df = df
# Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes.
def in_range(actual, start, stop):
return start <= actual <= stop
def evaluate(self):
user_start = input("Enter your start date (dd.mm.yyyy hour:min:second): ")
user_stop = input("Enter your end date (dd.mm.yyyy hour:min:second): ")
# creates series of True or False selecting proper rows.
mask = self.df['Date'].apply(self.in_range, args=(user_start, user_stop))
# Do the groupby and sum on only those rows.
montant_init = self.df.loc[mask].groupby("Initiateur")["Montant (centimes)"].sum()
print(montant_init)
output when printing: self.df.loc[mask]
Empty DataFrame
Columns: [Opération, Initiateur, Montant (centimes), Monnaie, Date, Résultat, Compte marchand, Adresse IP Acheteur, Marque de carte]
Index: []
下面的作品。有两个步骤:
- 为 select 右边的行制作遮罩
- 然后只对这些行进行分组和求和
遮罩功能:
# Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes.
def in_range(actual, start, stop):
return start <= actual <= stop
然后应用蒙版并执行groupby:
# creates series of True or False selecting proper rows.
mask = df['date'].apply(in_range, args=(user_start, user_stop))
# Do the groupby and sum on only those rows.
df2 = df.loc[mask].groupby('Initiator').sum()
请注意,user_start
和 user_stop
应该是用户定义的开始和停止日期时间。
大功告成!
更新:将方法作为 class 的一部分包含在内:
class evaluation():
def __init__(self, df):
self.df = df
# Will receive 'actual' datetime from df, and user defined 'start' and 'stop' datetimes. Add 'self' as arg in method.
def in_range(self, actual, start, stop):
return start <= actual <= stop
def evaluate(self):
user_start = pd.to_datetime(input("Enter your start date (yyyy.mm.dd hour:min:second): "))
user_stop = pd.to_datetime(input("Enter your end date (yyyy.mm.dd hour:min:second): "))
# creates series of True or False selecting proper rows.
mask = self.df['Date'].apply(self.in_range, args=(user_start, user_stop))
# Do the groupby and sum on only those rows.
amount_init = self.df.loc[mask].groupby("Initiator")["Price"].sum()
print(amount_init)
然后实例化一个新的对象class:
import pandas as pd
import dateutil.parser as dtp
import evaluation as eval # this is the class we just made
data = {
'Initiator': ['XXX', 'Glovoapp', 'Glovoapp'],
'Price': [560, 250, 250],
'Date': [dtp.parse('13/05/2020 11:05:35'), dtp.parse('12/05/2020 13:07:15'), dtp.parse('13/04/2020 12:09:25')]
}
df = pd.DataFrame(data)
eval_obj = eval.evaluation(df)
eval_obj.evaluate()