Pandas:如何提取和计算Dataframe中每行的"hour"个数
Pandas: How to extract and calculate the number of "hour" per row in a Dataframe
我有一个数据框表示一周内一些餐厅的时间表。
- 我想做的是在我的初始 Dataframe
df
中添加一列 week_hours
,代表餐厅每周营业的总小时数。
数据(用df.head(20).to_dict('split')
生成)
本例中的星期几是法语
{'index': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19],
'columns': ['restaurant_id',
'lundi',
'mardi',
'mercredi',
'jeudi',
'vendredi',
'samedi',
'dimanche'],
'data': [['lCwqJWMxvIUQt1Re_tDn4w',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0'],
['pd0v6sOqpLhFJ7mkpIaixw',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-22:0',
'11:0-22:0',
'11:0-17:0'],
['0vhi__HtC2L4-vScgDFdFw',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'12:0-22:0',
'16:30-21:30'],
['t65yfB9v9fqlhAkLnnUXdg',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
nan,
'11:30-21:0'],
['i7_JPit-2kAbtRTLkic2jA',
'11:30-22:0',
'11:30-22:0',
'11:30-23:0',
'11:30-23:0',
'11:30-23:0',
nan,
nan],
['vMh4madPU3qhNX7P7d8WGA', nan, nan, nan, nan, nan, nan, nan],
['BsvCTCVG7lrzXZ68VyyIcg',
'0:0-0:0',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30'],
['es3Fq9KNp6Ry994x4T4ZYg',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'7:0-14:0',
nan],
['Xb7jOAa17xtT_uA4sCCAsg', nan, nan, nan, nan, nan, nan, nan],
['1vrrpIhpK628PUA0XWWd8g',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0'],
['NYKxikYKbkacWumJ82TxzA',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0'],
['4sRJvmKh43AqMRrjdwEdwA',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-23:0',
'11:0-23:0',
'11:0-23:0'],
['laac2uH1lQVzBjKFUjuA1Q',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'8:0-14:0'],
['vVOoL5H8Fr-qlQv-_DdoMA',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-23:0',
'10:0-23:0',
'10:0-22:0'],
['k1c4gg8Ri5dre6ruPUKxJQ',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-22:30',
'9:0-22:30',
'12:0-21:0'],
['x9f9NBMweyyjCQHuc9K4sw',
'11:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-18:0',
'10:0-18:0',
nan],
['KWfLQddMBZNoh1bVcgASfA',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0'],
['4ScLXRii_WwBn5PbGBI-eg', nan, nan, nan, nan, nan, nan, nan],
['LAswzVTnT3uCvnKr-SwxEg', nan, nan, nan, nan, nan, nan, nan],
['G_wqVaqV3TBsZPAIIRCU-Q',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0']]}
允许我从每列(在本例中代表天数)中提取小时数以计算新列中一周的工作小时数的语法是什么?
如果需要任何说明或更简单的示例,请询问。
编辑 - 尝试了下面列出的解决方案,但结果不正确(例如在第一行?)
我认为这应该可以满足您的需求。它将新数据保存为浮点数(以小时为单位)。使用 datetime
(https://docs.python.org/3/library/datetime.html) 您可以轻松计算时间,如果您愿意的话。
这将函数 calculate_hours
应用于 7 个给定的列(天):
def calculate_hours(row: pd.Series) -> float:
try:
# split the given times to start and end time
opening_time, closing_time = row.split("-")
# split hours and minutes
start_hour, start_minute = opening_time.split(":")
end_hour, end_minute = closing_time.split(":")
# calculate start time (in hours)
start_time = float(start_hour) + float(start_minute) / 60
# calculate end time (in hours)
end_time = float(end_hour) + float(end_minute) / 60
# handle overneight and 24h openings
if start_time >= end_time:
end_time += 24
# return the duration from start time to end time
return end_time - start_time
# bare except are not recommended, you should look for your data, what could go wrong
except:
return 0.0
# Save the given data to the new column "open"
# sums up values for each day
df["open"] = df["lundi"].apply(calculate_hours) +\
df["mardi"].apply(calculate_hours) +\
df["mercredi"].apply(calculate_hours) +\
df["jeudi"].apply(calculate_hours) +\
df["vendredi"].apply(calculate_hours) +\
df["samedi"].apply(calculate_hours) +\
df["dimanche"].apply(calculate_hours)
PS:我用它来“导入”你的数据,它看起来并不完美,但我不知道如何更好地使用你的数据:
import pandas as pd
import datetime
all = {'index': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19],
'columns': ['restaurant_id',
'lundi',
'mardi',
'mercredi',
'jeudi',
'vendredi',
'samedi',
'dimanche'],
'data': [['lCwqJWMxvIUQt1Re_tDn4w',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0'],
['pd0v6sOqpLhFJ7mkpIaixw',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-22:0',
'11:0-22:0',
'11:0-17:0'],
['0vhi__HtC2L4-vScgDFdFw',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'12:0-22:0',
'16:30-21:30'],
['t65yfB9v9fqlhAkLnnUXdg',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
None,
'11:30-21:0'],
['i7_JPit-2kAbtRTLkic2jA',
'11:30-22:0',
'11:30-22:0',
'11:30-23:0',
'11:30-23:0',
'11:30-23:0',
None,
None],
['vMh4madPU3qhNX7P7d8WGA', None, None, None, None, None, None, None],
['BsvCTCVG7lrzXZ68VyyIcg',
'0:0-0:0',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30'],
['es3Fq9KNp6Ry994x4T4ZYg',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'7:0-14:0',
None],
['Xb7jOAa17xtT_uA4sCCAsg', None, None, None, None, None, None, None],
['1vrrpIhpK628PUA0XWWd8g',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0'],
['NYKxikYKbkacWumJ82TxzA',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0'],
['4sRJvmKh43AqMRrjdwEdwA',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-23:0',
'11:0-23:0',
'11:0-23:0'],
['laac2uH1lQVzBjKFUjuA1Q',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'8:0-14:0'],
['vVOoL5H8Fr-qlQv-_DdoMA',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-23:0',
'10:0-23:0',
'10:0-22:0'],
['k1c4gg8Ri5dre6ruPUKxJQ',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-22:30',
'9:0-22:30',
'12:0-21:0'],
['x9f9NBMweyyjCQHuc9K4sw',
'11:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-18:0',
'10:0-18:0',
None],
['KWfLQddMBZNoh1bVcgASfA',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0'],
['4ScLXRii_WwBn5PbGBI-eg', None, None, None, None, None, None, None],
['LAswzVTnT3uCvnKr-SwxEg', None, None, None, None, None, None, None],
['G_wqVaqV3TBsZPAIIRCU-Q',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0']]}
data = all["data"]
df = pd.DataFrame(data)
df.columns = all["columns"]
我有一个数据框表示一周内一些餐厅的时间表。
- 我想做的是在我的初始 Dataframe
df
中添加一列week_hours
,代表餐厅每周营业的总小时数。
数据(用df.head(20).to_dict('split')
生成)
本例中的星期几是法语
{'index': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19],
'columns': ['restaurant_id',
'lundi',
'mardi',
'mercredi',
'jeudi',
'vendredi',
'samedi',
'dimanche'],
'data': [['lCwqJWMxvIUQt1Re_tDn4w',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0'],
['pd0v6sOqpLhFJ7mkpIaixw',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-22:0',
'11:0-22:0',
'11:0-17:0'],
['0vhi__HtC2L4-vScgDFdFw',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'12:0-22:0',
'16:30-21:30'],
['t65yfB9v9fqlhAkLnnUXdg',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
nan,
'11:30-21:0'],
['i7_JPit-2kAbtRTLkic2jA',
'11:30-22:0',
'11:30-22:0',
'11:30-23:0',
'11:30-23:0',
'11:30-23:0',
nan,
nan],
['vMh4madPU3qhNX7P7d8WGA', nan, nan, nan, nan, nan, nan, nan],
['BsvCTCVG7lrzXZ68VyyIcg',
'0:0-0:0',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30'],
['es3Fq9KNp6Ry994x4T4ZYg',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'7:0-14:0',
nan],
['Xb7jOAa17xtT_uA4sCCAsg', nan, nan, nan, nan, nan, nan, nan],
['1vrrpIhpK628PUA0XWWd8g',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0'],
['NYKxikYKbkacWumJ82TxzA',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0'],
['4sRJvmKh43AqMRrjdwEdwA',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-23:0',
'11:0-23:0',
'11:0-23:0'],
['laac2uH1lQVzBjKFUjuA1Q',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'8:0-14:0'],
['vVOoL5H8Fr-qlQv-_DdoMA',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-23:0',
'10:0-23:0',
'10:0-22:0'],
['k1c4gg8Ri5dre6ruPUKxJQ',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-22:30',
'9:0-22:30',
'12:0-21:0'],
['x9f9NBMweyyjCQHuc9K4sw',
'11:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-18:0',
'10:0-18:0',
nan],
['KWfLQddMBZNoh1bVcgASfA',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0'],
['4ScLXRii_WwBn5PbGBI-eg', nan, nan, nan, nan, nan, nan, nan],
['LAswzVTnT3uCvnKr-SwxEg', nan, nan, nan, nan, nan, nan, nan],
['G_wqVaqV3TBsZPAIIRCU-Q',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0']]}
允许我从每列(在本例中代表天数)中提取小时数以计算新列中一周的工作小时数的语法是什么?
如果需要任何说明或更简单的示例,请询问。
编辑 - 尝试了下面列出的解决方案,但结果不正确(例如在第一行?)
我认为这应该可以满足您的需求。它将新数据保存为浮点数(以小时为单位)。使用 datetime
(https://docs.python.org/3/library/datetime.html) 您可以轻松计算时间,如果您愿意的话。
这将函数 calculate_hours
应用于 7 个给定的列(天):
def calculate_hours(row: pd.Series) -> float:
try:
# split the given times to start and end time
opening_time, closing_time = row.split("-")
# split hours and minutes
start_hour, start_minute = opening_time.split(":")
end_hour, end_minute = closing_time.split(":")
# calculate start time (in hours)
start_time = float(start_hour) + float(start_minute) / 60
# calculate end time (in hours)
end_time = float(end_hour) + float(end_minute) / 60
# handle overneight and 24h openings
if start_time >= end_time:
end_time += 24
# return the duration from start time to end time
return end_time - start_time
# bare except are not recommended, you should look for your data, what could go wrong
except:
return 0.0
# Save the given data to the new column "open"
# sums up values for each day
df["open"] = df["lundi"].apply(calculate_hours) +\
df["mardi"].apply(calculate_hours) +\
df["mercredi"].apply(calculate_hours) +\
df["jeudi"].apply(calculate_hours) +\
df["vendredi"].apply(calculate_hours) +\
df["samedi"].apply(calculate_hours) +\
df["dimanche"].apply(calculate_hours)
PS:我用它来“导入”你的数据,它看起来并不完美,但我不知道如何更好地使用你的数据:
import pandas as pd
import datetime
all = {'index': [0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19],
'columns': ['restaurant_id',
'lundi',
'mardi',
'mercredi',
'jeudi',
'vendredi',
'samedi',
'dimanche'],
'data': [['lCwqJWMxvIUQt1Re_tDn4w',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0',
'0:0-0:0'],
['pd0v6sOqpLhFJ7mkpIaixw',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-20:0',
'11:0-22:0',
'11:0-22:0',
'11:0-17:0'],
['0vhi__HtC2L4-vScgDFdFw',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'11:30-22:0',
'12:0-22:0',
'16:30-21:30'],
['t65yfB9v9fqlhAkLnnUXdg',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
'11:30-21:0',
None,
'11:30-21:0'],
['i7_JPit-2kAbtRTLkic2jA',
'11:30-22:0',
'11:30-22:0',
'11:30-23:0',
'11:30-23:0',
'11:30-23:0',
None,
None],
['vMh4madPU3qhNX7P7d8WGA', None, None, None, None, None, None, None],
['BsvCTCVG7lrzXZ68VyyIcg',
'0:0-0:0',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30',
'11:0-2:30'],
['es3Fq9KNp6Ry994x4T4ZYg',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'6:30-16:0',
'7:0-14:0',
None],
['Xb7jOAa17xtT_uA4sCCAsg', None, None, None, None, None, None, None],
['1vrrpIhpK628PUA0XWWd8g',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0',
'9:0-18:0'],
['NYKxikYKbkacWumJ82TxzA',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0',
'11:0-2:0'],
['4sRJvmKh43AqMRrjdwEdwA',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-22:0',
'11:0-23:0',
'11:0-23:0',
'11:0-23:0'],
['laac2uH1lQVzBjKFUjuA1Q',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'7:0-14:0',
'8:0-14:0'],
['vVOoL5H8Fr-qlQv-_DdoMA',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-22:0',
'10:0-23:0',
'10:0-23:0',
'10:0-22:0'],
['k1c4gg8Ri5dre6ruPUKxJQ',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-21:30',
'9:0-22:30',
'9:0-22:30',
'12:0-21:0'],
['x9f9NBMweyyjCQHuc9K4sw',
'11:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-17:0',
'10:0-18:0',
'10:0-18:0',
None],
['KWfLQddMBZNoh1bVcgASfA',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0',
'12:0-23:0'],
['4ScLXRii_WwBn5PbGBI-eg', None, None, None, None, None, None, None],
['LAswzVTnT3uCvnKr-SwxEg', None, None, None, None, None, None, None],
['G_wqVaqV3TBsZPAIIRCU-Q',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0',
'5:0-0:0']]}
data = all["data"]
df = pd.DataFrame(data)
df.columns = all["columns"]