重塑 python 中的数据集
reshaping the dataset in python
我有这个数据集:
Account
lookup
FY11USD
FY12USD
FY11local
FY12local
Sales
CA
1000
5000
800
4800
Sales
JP
5000
6500
10
15
尝试以这种格式获取数据:(下面的示例有 2 年的数据,但年数可能会有所不同)
Account
lookup
Year
USD
Local
Sales
CA
FY11
1000
800
Sales
CA
FY12
5000
4800
Sales
JP
FY11
5000
10
Sales
JP
FY12
6500
15
我试过使用下面的脚本,但它并没有在同一年将美元和本地货币分开。我应该怎么做?
df.melt(id_vars=["Account", "lookup"],
var_name="Year",
value_name="Value")
你可以这样拼凑起来:
dfn = (pd.concat(
[df[["Account", "lookup", 'FY11USD','FY12USD']].melt(id_vars=["Account", "lookup"], var_name="Year", value_name="USD"),
df[["Account", "lookup", 'FY11local','FY12local']].melt(id_vars=["Account", "lookup"], var_name="Year", value_name="Local")[['Local']]], axis=1 ))
dfn['Year'] = dfn['Year'].str[:4]
输出
Account lookup Year USD Local
0 Sales CA FY11 1000 800
1 Sales JP FY11 5000 10
2 Sales CA FY12 5000 4800
3 Sales JP FY12 6500 15
一个有效的选择是使用 pivot_longer from pyjanitor 转换为长格式,使用 .value
占位符 ---> .value
确定列的哪些部分保留为 headers:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(
index = ['Account', 'lookup'],
names_to = ('Year', '.value'),
names_pattern = r"(FY\d+)(.+)")
Account lookup Year USD local
0 Sales CA FY11 1000 800
1 Sales JP FY11 5000 10
2 Sales CA FY12 5000 4800
3 Sales JP FY12 6500 15
另一种选择是使用堆栈:
temp = df.set_index(['Account', 'lookup'])
temp.columns = temp.columns.str.split('(FY\d+)', expand = True).droplevel(0)
temp.columns.names = ['Year', None]
temp.stack('Year').reset_index()
Account lookup Year USD local
0 Sales CA FY11 1000 800
1 Sales CA FY12 5000 4800
2 Sales JP FY11 5000 10
3 Sales JP FY12 6500 15
您也可以在重塑列后使用 pd.wide_to_long
实现它:
index = ['Account', 'lookup']
temp = df.set_index(index)
temp.columns = (temp
.columns
.str.split('(FY\d+)')
.str[::-1]
.str.join('')
)
(pd.wide_to_long(
temp.reset_index(),
stubnames = ['USD', 'local'],
i = index,
j = 'Year',
suffix = '.+')
.reset_index()
)
Account lookup Year USD local
0 Sales CA FY11 1000 800
1 Sales CA FY12 5000 4800
2 Sales JP FY11 5000 10
3 Sales JP FY12 6500 15
我有这个数据集:
Account | lookup | FY11USD | FY12USD | FY11local | FY12local |
---|---|---|---|---|---|
Sales | CA | 1000 | 5000 | 800 | 4800 |
Sales | JP | 5000 | 6500 | 10 | 15 |
尝试以这种格式获取数据:(下面的示例有 2 年的数据,但年数可能会有所不同)
Account | lookup | Year | USD | Local |
---|---|---|---|---|
Sales | CA | FY11 | 1000 | 800 |
Sales | CA | FY12 | 5000 | 4800 |
Sales | JP | FY11 | 5000 | 10 |
Sales | JP | FY12 | 6500 | 15 |
我试过使用下面的脚本,但它并没有在同一年将美元和本地货币分开。我应该怎么做?
df.melt(id_vars=["Account", "lookup"],
var_name="Year",
value_name="Value")
你可以这样拼凑起来:
dfn = (pd.concat(
[df[["Account", "lookup", 'FY11USD','FY12USD']].melt(id_vars=["Account", "lookup"], var_name="Year", value_name="USD"),
df[["Account", "lookup", 'FY11local','FY12local']].melt(id_vars=["Account", "lookup"], var_name="Year", value_name="Local")[['Local']]], axis=1 ))
dfn['Year'] = dfn['Year'].str[:4]
输出
Account lookup Year USD Local
0 Sales CA FY11 1000 800
1 Sales JP FY11 5000 10
2 Sales CA FY12 5000 4800
3 Sales JP FY12 6500 15
一个有效的选择是使用 pivot_longer from pyjanitor 转换为长格式,使用 .value
占位符 ---> .value
确定列的哪些部分保留为 headers:
# pip install pyjanitor
import pandas as pd
import janitor
df.pivot_longer(
index = ['Account', 'lookup'],
names_to = ('Year', '.value'),
names_pattern = r"(FY\d+)(.+)")
Account lookup Year USD local
0 Sales CA FY11 1000 800
1 Sales JP FY11 5000 10
2 Sales CA FY12 5000 4800
3 Sales JP FY12 6500 15
另一种选择是使用堆栈:
temp = df.set_index(['Account', 'lookup'])
temp.columns = temp.columns.str.split('(FY\d+)', expand = True).droplevel(0)
temp.columns.names = ['Year', None]
temp.stack('Year').reset_index()
Account lookup Year USD local
0 Sales CA FY11 1000 800
1 Sales CA FY12 5000 4800
2 Sales JP FY11 5000 10
3 Sales JP FY12 6500 15
您也可以在重塑列后使用 pd.wide_to_long
实现它:
index = ['Account', 'lookup']
temp = df.set_index(index)
temp.columns = (temp
.columns
.str.split('(FY\d+)')
.str[::-1]
.str.join('')
)
(pd.wide_to_long(
temp.reset_index(),
stubnames = ['USD', 'local'],
i = index,
j = 'Year',
suffix = '.+')
.reset_index()
)
Account lookup Year USD local
0 Sales CA FY11 1000 800
1 Sales CA FY12 5000 4800
2 Sales JP FY11 5000 10
3 Sales JP FY12 6500 15