Pandas：如何提取和计算Dataframe中每行的"hour"个数

Question

我有一个数据框表示一周内一些餐厅的时间表。

我想做的是在我的初始 Dataframe df 中添加一列 week_hours，代表餐厅每周营业的总小时数。

数据（用df.head(20).to_dict('split')生成）

本例中的星期几是法语

{'index': [0,
  1,
  2,
  3,
  4,
  5,
  6,
  7,
  8,
  9,
  10,
  11,
  12,
  13,
  14,
  15,
  16,
  17,
  18,
  19],
 'columns': ['restaurant_id',
  'lundi',
  'mardi',
  'mercredi',
  'jeudi',
  'vendredi',
  'samedi',
  'dimanche'],
 'data': [['lCwqJWMxvIUQt1Re_tDn4w',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0',
   '0:0-0:0'],
  ['pd0v6sOqpLhFJ7mkpIaixw',
   '11:0-20:0',
   '11:0-20:0',
   '11:0-20:0',
   '11:0-20:0',
   '11:0-22:0',
   '11:0-22:0',
   '11:0-17:0'],
  ['0vhi__HtC2L4-vScgDFdFw',
   '11:30-22:0',
   '11:30-22:0',
   '11:30-22:0',
   '11:30-22:0',
   '11:30-22:0',
   '12:0-22:0',
   '16:30-21:30'],
  ['t65yfB9v9fqlhAkLnnUXdg',
   '11:30-21:0',
   '11:30-21:0',
   '11:30-21:0',
   '11:30-21:0',
   '11:30-21:0',
   nan,
   '11:30-21:0'],
  ['i7_JPit-2kAbtRTLkic2jA',
   '11:30-22:0',
   '11:30-22:0',
   '11:30-23:0',
   '11:30-23:0',
   '11:30-23:0',
   nan,
   nan],
  ['vMh4madPU3qhNX7P7d8WGA', nan, nan, nan, nan, nan, nan, nan],
  ['BsvCTCVG7lrzXZ68VyyIcg',
   '0:0-0:0',
   '11:0-2:30',
   '11:0-2:30',
   '11:0-2:30',
   '11:0-2:30',
   '11:0-2:30',
   '11:0-2:30'],
  ['es3Fq9KNp6Ry994x4T4ZYg',
   '6:30-16:0',
   '6:30-16:0',
   '6:30-16:0',
   '6:30-16:0',
   '6:30-16:0',
   '7:0-14:0',
   nan],
  ['Xb7jOAa17xtT_uA4sCCAsg', nan, nan, nan, nan, nan, nan, nan],
  ['1vrrpIhpK628PUA0XWWd8g',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0',
   '9:0-18:0'],
  ['NYKxikYKbkacWumJ82TxzA',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0',
   '11:0-2:0'],
  ['4sRJvmKh43AqMRrjdwEdwA',
   '11:0-22:0',
   '11:0-22:0',
   '11:0-22:0',
   '11:0-22:0',
   '11:0-23:0',
   '11:0-23:0',
   '11:0-23:0'],
  ['laac2uH1lQVzBjKFUjuA1Q',
   '7:0-14:0',
   '7:0-14:0',
   '7:0-14:0',
   '7:0-14:0',
   '7:0-14:0',
   '7:0-14:0',
   '8:0-14:0'],
  ['vVOoL5H8Fr-qlQv-_DdoMA',
   '10:0-22:0',
   '10:0-22:0',
   '10:0-22:0',
   '10:0-22:0',
   '10:0-23:0',
   '10:0-23:0',
   '10:0-22:0'],
  ['k1c4gg8Ri5dre6ruPUKxJQ',
   '9:0-21:30',
   '9:0-21:30',
   '9:0-21:30',
   '9:0-21:30',
   '9:0-22:30',
   '9:0-22:30',
   '12:0-21:0'],
  ['x9f9NBMweyyjCQHuc9K4sw',
   '11:0-17:0',
   '10:0-17:0',
   '10:0-17:0',
   '10:0-17:0',
   '10:0-18:0',
   '10:0-18:0',
   nan],
  ['KWfLQddMBZNoh1bVcgASfA',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0',
   '12:0-23:0'],
  ['4ScLXRii_WwBn5PbGBI-eg', nan, nan, nan, nan, nan, nan, nan],
  ['LAswzVTnT3uCvnKr-SwxEg', nan, nan, nan, nan, nan, nan, nan],
  ['G_wqVaqV3TBsZPAIIRCU-Q',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0',
   '5:0-0:0']]}

允许我从每列（在本例中代表天数）中提取小时数以计算新列中一周的工作小时数的语法是什么？

如果需要任何说明或更简单的示例，请询问。

编辑 - 尝试了下面列出的解决方案，但结果不正确（例如在第一行？）

Answer 1

我认为这应该可以满足您的需求。它将新数据保存为浮点数（以小时为单位）。使用 datetime (https://docs.python.org/3/library/datetime.html) 您可以轻松计算时间，如果您愿意的话。这将函数 calculate_hours 应用于 7 个给定的列（天）：

def calculate_hours(row: pd.Series) -> float:
    try:
        # split the given times to start and end time
        opening_time, closing_time = row.split("-")
        # split hours and minutes
        start_hour, start_minute = opening_time.split(":")
        end_hour, end_minute = closing_time.split(":")
        # calculate start time (in hours)
        start_time = float(start_hour) + float(start_minute) / 60
        # calculate end time (in hours)
        end_time = float(end_hour) + float(end_minute) / 60
        # handle overneight and 24h openings
        if start_time >= end_time:
            end_time += 24
        # return the duration from start time to end time
        return end_time - start_time
    # bare except are not recommended, you should look for your data, what could go wrong
    except:
        return 0.0


# Save the given data to the new column "open"
# sums up values for each day
df["open"] = df["lundi"].apply(calculate_hours) +\
             df["mardi"].apply(calculate_hours) +\
             df["mercredi"].apply(calculate_hours) +\
             df["jeudi"].apply(calculate_hours) +\
             df["vendredi"].apply(calculate_hours) +\
             df["samedi"].apply(calculate_hours) +\
             df["dimanche"].apply(calculate_hours)

PS：我用它来“导入”你的数据，它看起来并不完美，但我不知道如何更好地使用你的数据：

import pandas as pd
import datetime

all = {'index': [0,
                  1,
                  2,
                  3,
                  4,
                  5,
                  6,
                  7,
                  8,
                  9,
                  10,
                  11,
                  12,
                  13,
                  14,
                  15,
                  16,
                  17,
                  18,
                  19],
        'columns': ['restaurant_id',
                    'lundi',
                    'mardi',
                    'mercredi',
                    'jeudi',
                    'vendredi',
                    'samedi',
                    'dimanche'],
        'data': [['lCwqJWMxvIUQt1Re_tDn4w',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0',
                  '0:0-0:0'],
                 ['pd0v6sOqpLhFJ7mkpIaixw',
                  '11:0-20:0',
                  '11:0-20:0',
                  '11:0-20:0',
                  '11:0-20:0',
                  '11:0-22:0',
                  '11:0-22:0',
                  '11:0-17:0'],
                 ['0vhi__HtC2L4-vScgDFdFw',
                  '11:30-22:0',
                  '11:30-22:0',
                  '11:30-22:0',
                  '11:30-22:0',
                  '11:30-22:0',
                  '12:0-22:0',
                  '16:30-21:30'],
                 ['t65yfB9v9fqlhAkLnnUXdg',
                  '11:30-21:0',
                  '11:30-21:0',
                  '11:30-21:0',
                  '11:30-21:0',
                  '11:30-21:0',
                  None,
                  '11:30-21:0'],
                 ['i7_JPit-2kAbtRTLkic2jA',
                  '11:30-22:0',
                  '11:30-22:0',
                  '11:30-23:0',
                  '11:30-23:0',
                  '11:30-23:0',
                  None,
                  None],
                 ['vMh4madPU3qhNX7P7d8WGA', None, None, None, None, None, None, None],
                 ['BsvCTCVG7lrzXZ68VyyIcg',
                  '0:0-0:0',
                  '11:0-2:30',
                  '11:0-2:30',
                  '11:0-2:30',
                  '11:0-2:30',
                  '11:0-2:30',
                  '11:0-2:30'],
                 ['es3Fq9KNp6Ry994x4T4ZYg',
                  '6:30-16:0',
                  '6:30-16:0',
                  '6:30-16:0',
                  '6:30-16:0',
                  '6:30-16:0',
                  '7:0-14:0',
                  None],
                 ['Xb7jOAa17xtT_uA4sCCAsg', None, None, None, None, None, None, None],
                 ['1vrrpIhpK628PUA0XWWd8g',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0',
                  '9:0-18:0'],
                 ['NYKxikYKbkacWumJ82TxzA',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0',
                  '11:0-2:0'],
                 ['4sRJvmKh43AqMRrjdwEdwA',
                  '11:0-22:0',
                  '11:0-22:0',
                  '11:0-22:0',
                  '11:0-22:0',
                  '11:0-23:0',
                  '11:0-23:0',
                  '11:0-23:0'],
                 ['laac2uH1lQVzBjKFUjuA1Q',
                  '7:0-14:0',
                  '7:0-14:0',
                  '7:0-14:0',
                  '7:0-14:0',
                  '7:0-14:0',
                  '7:0-14:0',
                  '8:0-14:0'],
                 ['vVOoL5H8Fr-qlQv-_DdoMA',
                  '10:0-22:0',
                  '10:0-22:0',
                  '10:0-22:0',
                  '10:0-22:0',
                  '10:0-23:0',
                  '10:0-23:0',
                  '10:0-22:0'],
                 ['k1c4gg8Ri5dre6ruPUKxJQ',
                  '9:0-21:30',
                  '9:0-21:30',
                  '9:0-21:30',
                  '9:0-21:30',
                  '9:0-22:30',
                  '9:0-22:30',
                  '12:0-21:0'],
                 ['x9f9NBMweyyjCQHuc9K4sw',
                  '11:0-17:0',
                  '10:0-17:0',
                  '10:0-17:0',
                  '10:0-17:0',
                  '10:0-18:0',
                  '10:0-18:0',
                  None],
                 ['KWfLQddMBZNoh1bVcgASfA',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0',
                  '12:0-23:0'],
                 ['4ScLXRii_WwBn5PbGBI-eg', None, None, None, None, None, None, None],
                 ['LAswzVTnT3uCvnKr-SwxEg', None, None, None, None, None, None, None],
                 ['G_wqVaqV3TBsZPAIIRCU-Q',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0',
                  '5:0-0:0']]}


data = all["data"]
df = pd.DataFrame(data)
df.columns = all["columns"]

Pandas：如何提取和计算Dataframe中每行的"hour"个数

Pandas: How to extract and calculate the number of "hour" per row in a Dataframe

python

dataframe

pandas

feature-engineering