pandas如何将数据分为夜间数据和白天数据

How to divide data into night time data and day time data in pandas

各位,

我需要帮助将 pandas 这个数据帧分成夜间和白天数据。让我们假设夜晚在 17:00 之后和 08:30 之前,白天在 08:30 和 17:00 之间。

Date   Time     Open     High      Low    Close  Vol
7   2019-09-02  05:00  11919.9  11929.7  11917.7  11918.9  240
8   2019-09-02  06:00  11920.7  11940.4  11917.7  11927.9  240
9   2019-09-02  07:00  11927.4  11966.2  11927.2  11936.4  240
10  2019-09-02  08:00  11936.9  11955.9  11928.1  11951.4  240
11  2019-09-02  09:00  11951.4  11960.2  11939.4  11954.4  240
12  2019-09-02  10:00  11953.9  11995.9  11951.4  11976.9  240
13  2019-09-02  11:00  11976.7  11979.4  11956.2  11965.9  240
14  2019-09-02  12:00  11966.2  11971.4  11956.4  11965.4  240
15  2019-09-02  13:00  11965.7  11969.7  11943.4  11947.7  240
16  2019-09-02  14:00  11947.4  11962.4  11943.9  11960.7  240
17  2019-09-02  15:00  11960.9  11964.2  11901.2  11934.9  240
18  2019-09-02  16:00  11934.9  11939.7  11921.4  11929.7  240
19  2019-09-02  17:00  11929.9  11940.4  11928.4  11938.2  236
20  2019-09-02  18:00  11937.9  11938.2  11934.7  11938.2  176
21  2019-09-02  19:00  11937.9  11948.7  11937.7  11943.2  196

between_time 仅显示当前日期的时间,因此仅此一项不会显示。

一个想法是将 Time 列转换为时间增量并使用 Series.between:

通过布尔掩码过滤
mask = (pd.to_timedelta(df['Time'].astype(str).add(':00'))
          .between(pd.Timedelta('08:30:00'), pd.Timedelta('17:00:00')))
df1 = df[mask]
print (df1)
          Date   Time     Open     High      Low    Close  Vol
11  2019-09-02  09:00  11951.4  11960.2  11939.4  11954.4  240
12  2019-09-02  10:00  11953.9  11995.9  11951.4  11976.9  240
13  2019-09-02  11:00  11976.7  11979.4  11956.2  11965.9  240
14  2019-09-02  12:00  11966.2  11971.4  11956.4  11965.4  240
15  2019-09-02  13:00  11965.7  11969.7  11943.4  11947.7  240
16  2019-09-02  14:00  11947.4  11962.4  11943.9  11960.7  240
17  2019-09-02  15:00  11960.9  11964.2  11901.2  11934.9  240
18  2019-09-02  16:00  11934.9  11939.7  11921.4  11929.7  240
19  2019-09-02  17:00  11929.9  11940.4  11928.4  11938.2  236

df2 = df[~mask]
print (df2)
          Date   Time     Open     High      Low    Close  Vol
7   2019-09-02  05:00  11919.9  11929.7  11917.7  11918.9  240
8   2019-09-02  06:00  11920.7  11940.4  11917.7  11927.9  240
9   2019-09-02  07:00  11927.4  11966.2  11927.2  11936.4  240
10  2019-09-02  08:00  11936.9  11955.9  11928.1  11951.4  240
20  2019-09-02  18:00  11937.9  11938.2  11934.7  11938.2  176
21  2019-09-02  19:00  11937.9  11948.7  11937.7  11943.2  196

编辑:

DataFrame.between_time 的另一个想法,但有必要 DatetimeIndex:

df['Datetime'] = pd.to_datetime(df['Date'].astype(str) + ':' + df['Time'].astype(str))
df = df.set_index('Datetime')

day = df.between_time('09:00','17:00')
night = df[~df.index.isin(day.index)]

我会尝试这样的事情,显然将时间更改为您需要的时间!但这是一般的想法。

In [58]: df = pd.DataFrame({"Time":[
    ...: "05:00",
    ...: "06:00",
    ...: "07:00",
    ...: "08:00",
    ...: "09:00",
    ...: "10:00",
    ...: "11:00",
    ...: "12:00",
    ...: "13:00",
    ...: "14:00",
    ...: "15:00",
    ...: "16:00",
    ...: "17:00",
    ...: "18:00",
    ...: "19:00"]})

In [59]: df = df.set_index(pd.to_datetime(df["Time"]))

In [60]: df
Out[60]:
                      Time
Time
2019-09-15 05:00:00  05:00
2019-09-15 06:00:00  06:00
2019-09-15 07:00:00  07:00
2019-09-15 08:00:00  08:00
2019-09-15 09:00:00  09:00
2019-09-15 10:00:00  10:00
2019-09-15 11:00:00  11:00
2019-09-15 12:00:00  12:00
2019-09-15 13:00:00  13:00
2019-09-15 14:00:00  14:00
2019-09-15 15:00:00  15:00
2019-09-15 16:00:00  16:00
2019-09-15 17:00:00  17:00
2019-09-15 18:00:00  18:00
2019-09-15 19:00:00  19:00

In [61]: df["time_desc"] = "night"

In [62]: df
Out[62]:
                      Time time_desc
Time
2019-09-15 05:00:00  05:00     night
2019-09-15 06:00:00  06:00     night
2019-09-15 07:00:00  07:00     night
2019-09-15 08:00:00  08:00     night
2019-09-15 09:00:00  09:00     night
2019-09-15 10:00:00  10:00     night
2019-09-15 11:00:00  11:00     night
2019-09-15 12:00:00  12:00     night
2019-09-15 13:00:00  13:00     night
2019-09-15 14:00:00  14:00     night
2019-09-15 15:00:00  15:00     night
2019-09-15 16:00:00  16:00     night
2019-09-15 17:00:00  17:00     night
2019-09-15 18:00:00  18:00     night
2019-09-15 19:00:00  19:00     night

In [63]: df.loc[df.between_time("06:30", "18:00").index, "time_desc"] = "day"

In [64]: df
Out[64]:
                      Time time_desc
Time
2019-09-15 05:00:00  05:00     night
2019-09-15 06:00:00  06:00     night
2019-09-15 07:00:00  07:00       day
2019-09-15 08:00:00  08:00       day
2019-09-15 09:00:00  09:00       day
2019-09-15 10:00:00  10:00       day
2019-09-15 11:00:00  11:00       day
2019-09-15 12:00:00  12:00       day
2019-09-15 13:00:00  13:00       day
2019-09-15 14:00:00  14:00       day
2019-09-15 15:00:00  15:00       day
2019-09-15 16:00:00  16:00       day
2019-09-15 17:00:00  17:00       day
2019-09-15 18:00:00  18:00       day
2019-09-15 19:00:00  19:00     night