将名称中带有日期的列转换为 Python 中的分隔行
Converting columns with date in names to separate rows in Python
我已经得到 的答案,想知道如何在 Python 中实现。
假设我们有一个这样的 pandas DataFrame:
import pandas as pd
d = pd.DataFrame({'2019Q1':[1], '2019Q2':[2], '2019Q3':[3]})
显示如下:
2019Q1 2019Q2 2019Q3
0 1 2 3
如何将其转换为如下所示:
Year Quarter Value
2019 1 1
2019 2 2
2019 3 3
使用Series.str.split
for MultiIndex
with expand=True
and then reshape by DataFrame.unstack
, last data cleaning with with Series.reset_index
and Series.rename_axis
:
d = pd.DataFrame({'2019Q1':[1], '2019Q2':[2], '2019Q3':[3]})
d.columns = d.columns.str.split('Q', expand=True)
df = (d.unstack(0)
.reset_index(level=2, drop=True)
.rename_axis(('Year','Quarter'))
.reset_index(name='Value'))
print (df)
Year Quarter Value
0 2019 1 1
1 2019 2 2
2 2019 3 3
感谢@Jon Clements 提供另一个解决方案:
df = (d.melt()
.variable
.str.extract('(?P<Year>\d{4})Q(?P<Quarter>\d)')
.assign(Value=d.T.values.flatten()))
print (df)
Year Quarter Value
0 2019 1 1
1 2019 2 2
2 2019 3 3
替代split
:
df = (d.melt()
.variable
.str.split('Q', expand=True)
.rename(columns={0:'Year',1:'Quarter'})
.assign(Value=d.T.values.flatten()))
print (df)
Year Quarter Value
0 2019 1 1
1 2019 2 2
2 2019 3 3
使用DataFrame.stack
with DataFrame.pop
and Series.str.split
:
df = d.stack().reset_index(level=1).rename(columns={0:'Value'})
df[['Year', 'Quarter']] = df.pop('level_1').str.split('Q', expand=True)
Value Year Quarter
0 1 2019 1
0 2 2019 2
0 3 2019 3
如果您关心列的顺序,请使用 reindex
:
df = df.reindex(['Year', 'Quarter', 'Value'], axis=1)
Year Quarter Value
0 2019 1 1
0 2019 2 2
0 2019 3 3
我已经得到
假设我们有一个这样的 pandas DataFrame:
import pandas as pd
d = pd.DataFrame({'2019Q1':[1], '2019Q2':[2], '2019Q3':[3]})
显示如下:
2019Q1 2019Q2 2019Q3
0 1 2 3
如何将其转换为如下所示:
Year Quarter Value
2019 1 1
2019 2 2
2019 3 3
使用Series.str.split
for MultiIndex
with expand=True
and then reshape by DataFrame.unstack
, last data cleaning with with Series.reset_index
and Series.rename_axis
:
d = pd.DataFrame({'2019Q1':[1], '2019Q2':[2], '2019Q3':[3]})
d.columns = d.columns.str.split('Q', expand=True)
df = (d.unstack(0)
.reset_index(level=2, drop=True)
.rename_axis(('Year','Quarter'))
.reset_index(name='Value'))
print (df)
Year Quarter Value
0 2019 1 1
1 2019 2 2
2 2019 3 3
感谢@Jon Clements 提供另一个解决方案:
df = (d.melt()
.variable
.str.extract('(?P<Year>\d{4})Q(?P<Quarter>\d)')
.assign(Value=d.T.values.flatten()))
print (df)
Year Quarter Value
0 2019 1 1
1 2019 2 2
2 2019 3 3
替代split
:
df = (d.melt()
.variable
.str.split('Q', expand=True)
.rename(columns={0:'Year',1:'Quarter'})
.assign(Value=d.T.values.flatten()))
print (df)
Year Quarter Value
0 2019 1 1
1 2019 2 2
2 2019 3 3
使用DataFrame.stack
with DataFrame.pop
and Series.str.split
:
df = d.stack().reset_index(level=1).rename(columns={0:'Value'})
df[['Year', 'Quarter']] = df.pop('level_1').str.split('Q', expand=True)
Value Year Quarter
0 1 2019 1
0 2 2019 2
0 3 2019 3
如果您关心列的顺序,请使用 reindex
:
df = df.reindex(['Year', 'Quarter', 'Value'], axis=1)
Year Quarter Value
0 2019 1 1
0 2019 2 2
0 2019 3 3