将列名 (H1, H2,...) 中的 pandas 数据帧每小时值转换为单独列中的系列
Convert pandas dataframe hourly values in column names (H1, H2,... ) to a series in a separate column
我正在尝试转换一个数据框,其中每小时数据出现在不同的列中,如下所示:
... 到只包含两列的数据框 ['datetime', 'value'].
例如:
Datetime
value
2020-01-01 01:00:00
0
2020-01-01 02:00:00
0
...
...
2020-01-01 09:00:00
106
2020-01-01 10:00:00
2852
有没有不使用 for 循环的解决方案?
使用 DataFrame.melt
with convert values to datetimes and add hours by to_timedelta
删除 H
:
df = df.melt('Date')
td = pd.to_timedelta(df.pop('variable').str.strip('H').astype(int), unit='H')
df['Date'] = pd.to_datetime(df['Date']) + td
您可以通过对 DataFrame 应用几个函数来实现:
from datetime import datetime
# Example DataFrame
df = pd.DataFrame({'date': ['1/1/2020', '1/2/2020', '1/3/2020'],
'h1': [0, 222, 333],
'h2': [44, 0, 0],
"h3": [1, 2, 3]})
# To simplify I used only hours in range 1...3, so You must change it to 25
HOURS_COUNT = 4
df["hours"] = df.apply(lambda row: [h for h in range(1, HOURS_COUNT)], axis=1)
df["hour_values"] = df.apply(lambda row: {h: row[f"h{h}"] for h in range(1, HOURS_COUNT)}, axis=1)
df = df.explode("hours")
df["value"] = df.apply(lambda row: row["hour_values"][row["hours"]], axis=1)
df["date_full"] = df.apply(lambda row: datetime.strptime(f"{row['date']} {row['hours']}", "%m/%d/%Y %H"), axis=1)
df = df[["date_full", "value"]]
df = df.loc[df["value"] > 0]
所以初始 DataFrame 是:
date h1 h2 h3
0 1/1/2020 0 44 1
1 1/2/2020 222 0 2
2 1/3/2020 333 0 3
结果 DataFrame 是:
date_full value
0 2020-01-01 02:00:00 44
0 2020-01-01 03:00:00 1
1 2020-01-02 01:00:00 222
1 2020-01-02 03:00:00 2
2 2020-01-03 01:00:00 333
2 2020-01-03 03:00:00 3
我正在尝试转换一个数据框,其中每小时数据出现在不同的列中,如下所示:
... 到只包含两列的数据框 ['datetime', 'value'].
例如:
Datetime | value |
---|---|
2020-01-01 01:00:00 | 0 |
2020-01-01 02:00:00 | 0 |
... | ... |
2020-01-01 09:00:00 | 106 |
2020-01-01 10:00:00 | 2852 |
有没有不使用 for 循环的解决方案?
使用 DataFrame.melt
with convert values to datetimes and add hours by to_timedelta
删除 H
:
df = df.melt('Date')
td = pd.to_timedelta(df.pop('variable').str.strip('H').astype(int), unit='H')
df['Date'] = pd.to_datetime(df['Date']) + td
您可以通过对 DataFrame 应用几个函数来实现:
from datetime import datetime
# Example DataFrame
df = pd.DataFrame({'date': ['1/1/2020', '1/2/2020', '1/3/2020'],
'h1': [0, 222, 333],
'h2': [44, 0, 0],
"h3": [1, 2, 3]})
# To simplify I used only hours in range 1...3, so You must change it to 25
HOURS_COUNT = 4
df["hours"] = df.apply(lambda row: [h for h in range(1, HOURS_COUNT)], axis=1)
df["hour_values"] = df.apply(lambda row: {h: row[f"h{h}"] for h in range(1, HOURS_COUNT)}, axis=1)
df = df.explode("hours")
df["value"] = df.apply(lambda row: row["hour_values"][row["hours"]], axis=1)
df["date_full"] = df.apply(lambda row: datetime.strptime(f"{row['date']} {row['hours']}", "%m/%d/%Y %H"), axis=1)
df = df[["date_full", "value"]]
df = df.loc[df["value"] > 0]
所以初始 DataFrame 是:
date h1 h2 h3
0 1/1/2020 0 44 1
1 1/2/2020 222 0 2
2 1/3/2020 333 0 3
结果 DataFrame 是:
date_full value
0 2020-01-01 02:00:00 44
0 2020-01-01 03:00:00 1
1 2020-01-02 01:00:00 222
1 2020-01-02 03:00:00 2
2 2020-01-03 01:00:00 333
2 2020-01-03 03:00:00 3