如何用自定义名称融化 pandas
How do I melt a pandas with custom nam
我有一个table这样的
device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd
PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92
我要这样改造:
device_type version pool Name Mean P50 P90 P99 Std
PNB0Q7 8108162 123 test 123 136 140.8 142.88 21.35
PNB0Q7 8108162 123 Widget 2.2 0 6.4 9.64 3.92
我尝试使用 melt 但得到:
df.melt(id_vars=["device_type", "version", "pool"], var_name="Name", value_name="Value")
device_type version pool Name Value
PNB0Q7 8108162 test testMean 124.00
PNB0Q7 8108162 test testP50 136.00
PNB0Q7 8108162 test testP90 140.80
PNB0Q7 8108162 test testP99 141.88
PNB0Q7 8108162 test testStd 21.35
关于如何达到预期解决方案的任何想法
df.columns = ['device_type', 'version', 'pool', 'Mean', 'P50', 'P90', 'P99', 'Std']
df['Name'] = 'test'
df = df[['device_type', 'version', 'pool', 'Name', 'Mean', 'P50', 'P90', 'P99', 'Std']]
print(df)
输出:
device_type version pool Name Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 test 124 136 140.8 141.88 21.35
一个选项是使用 .value
占位符将 pivot_longer from pyjanitor 转换为长格式 ---> .value
确定列的哪些部分保留为 headers.首先我们要保证Test
是小写的:
# pip install pyjanitor
import pandas as pd
import janitor
df.columns = df.columns.str.replace('Test', 'test')
df
device_type version pool testMean testP50 testP90 testP99 Std
0 PNB0Q7 8108162 123 124 136 140.8 141.88 21.35
df.pivot_longer(
column_names = 'test*',
names_to = ('Name', '.value'),
names_pattern = r"(test)(.+)"
)
device_type version pool Std Name Mean P50 P90 P99
0 PNB0Q7 8108162 123 21.35 test 124 136 140.8 141.88
对于更新后的数据,同样的概念适用;但是,您需要正确安排列 - 将 Test
设为小写,将 Std
更改为 testStd
:
df.columns = df.columns.str.replace('Test', 'test')
df = df.rename(columns = {'Std': 'testStd'})
df
device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd
0 PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92
df.pivot_longer(
column_names = ['test*', 'Widget*'],
names_to = ('Name', '.value'),
names_pattern = r"(test|Widget)(.+)"
)
device_type version pool Name Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 test 124.0 136 140.8 141.88 21.35
1 PNB0Q7 8108162 123 Widget 2.2 0 6.4 9.64 3.92
您可以使用 pd.wide_to_long
和一个小的列命名清理来完成此操作,然后重塑:
df = df.rename(columns={'Std':'testStd',
'TestP90':'testP90',
'TestP99':'testP99',
'TestP50':'testP50'})
df_out = pd.wide_to_long(df,
['test','Widget'],
['device_type', 'version', 'pool'],
'Measure', '', '.+' )
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out
输出:
Measure device_type version pool level_3 Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 Widget 2.2 0.0 6.4 9.64 3.92
1 PNB0Q7 8108162 123 test 124.0 136.0 140.8 141.88 21.35
更新重命名 'level_3' 以上:
df = df.rename(columns={'Std':'testStd',
'TestP90':'testP90',
'TestP99':'testP99',
'TestP50':'testP50'})
df_out = pd.wide_to_long(df,
['test','Widget'],
['device_type', 'version', 'pool'],
'Measure', '', '.+' )\
.rename_axis('Instrument', axis=1) #add this line to rename column header axis
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out
输出:
Measure device_type version pool Instrument Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 Widget 2.2 0.0 6.4 9.64 3.92
1 PNB0Q7 8108162 123 test 124.0 136.0 140.8 141.88 21.35
我有一个table这样的
device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd
PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92
我要这样改造:
device_type version pool Name Mean P50 P90 P99 Std
PNB0Q7 8108162 123 test 123 136 140.8 142.88 21.35
PNB0Q7 8108162 123 Widget 2.2 0 6.4 9.64 3.92
我尝试使用 melt 但得到:
df.melt(id_vars=["device_type", "version", "pool"], var_name="Name", value_name="Value")
device_type version pool Name Value
PNB0Q7 8108162 test testMean 124.00
PNB0Q7 8108162 test testP50 136.00
PNB0Q7 8108162 test testP90 140.80
PNB0Q7 8108162 test testP99 141.88
PNB0Q7 8108162 test testStd 21.35
关于如何达到预期解决方案的任何想法
df.columns = ['device_type', 'version', 'pool', 'Mean', 'P50', 'P90', 'P99', 'Std']
df['Name'] = 'test'
df = df[['device_type', 'version', 'pool', 'Name', 'Mean', 'P50', 'P90', 'P99', 'Std']]
print(df)
输出:
device_type version pool Name Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 test 124 136 140.8 141.88 21.35
一个选项是使用 .value
占位符将 pivot_longer from pyjanitor 转换为长格式 ---> .value
确定列的哪些部分保留为 headers.首先我们要保证Test
是小写的:
# pip install pyjanitor
import pandas as pd
import janitor
df.columns = df.columns.str.replace('Test', 'test')
df
device_type version pool testMean testP50 testP90 testP99 Std
0 PNB0Q7 8108162 123 124 136 140.8 141.88 21.35
df.pivot_longer(
column_names = 'test*',
names_to = ('Name', '.value'),
names_pattern = r"(test)(.+)"
)
device_type version pool Std Name Mean P50 P90 P99
0 PNB0Q7 8108162 123 21.35 test 124 136 140.8 141.88
对于更新后的数据,同样的概念适用;但是,您需要正确安排列 - 将 Test
设为小写,将 Std
更改为 testStd
:
df.columns = df.columns.str.replace('Test', 'test')
df = df.rename(columns = {'Std': 'testStd'})
df
device_type version pool testMean testP50 testP90 testP99 testStd WidgetMean WidgetP50 WidgetP90 WidgetP99 WidgetStd
0 PNB0Q7 8108162 123 124 136 140.8 141.88 21.35 2.2 0 6.4 9.64 3.92
df.pivot_longer(
column_names = ['test*', 'Widget*'],
names_to = ('Name', '.value'),
names_pattern = r"(test|Widget)(.+)"
)
device_type version pool Name Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 test 124.0 136 140.8 141.88 21.35
1 PNB0Q7 8108162 123 Widget 2.2 0 6.4 9.64 3.92
您可以使用 pd.wide_to_long
和一个小的列命名清理来完成此操作,然后重塑:
df = df.rename(columns={'Std':'testStd',
'TestP90':'testP90',
'TestP99':'testP99',
'TestP50':'testP50'})
df_out = pd.wide_to_long(df,
['test','Widget'],
['device_type', 'version', 'pool'],
'Measure', '', '.+' )
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out
输出:
Measure device_type version pool level_3 Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 Widget 2.2 0.0 6.4 9.64 3.92
1 PNB0Q7 8108162 123 test 124.0 136.0 140.8 141.88 21.35
更新重命名 'level_3' 以上:
df = df.rename(columns={'Std':'testStd',
'TestP90':'testP90',
'TestP99':'testP99',
'TestP50':'testP50'})
df_out = pd.wide_to_long(df,
['test','Widget'],
['device_type', 'version', 'pool'],
'Measure', '', '.+' )\
.rename_axis('Instrument', axis=1) #add this line to rename column header axis
df_out = df_out.unstack(-1).stack(0).reset_index()
df_out
输出:
Measure device_type version pool Instrument Mean P50 P90 P99 Std
0 PNB0Q7 8108162 123 Widget 2.2 0.0 6.4 9.64 3.92
1 PNB0Q7 8108162 123 test 124.0 136.0 140.8 141.88 21.35