如何提取 melt 函数的列名? Python
how to extract column name for melt function? Python
我有一个包含以下列的数据集。
data.columns[1:]
Index(['Fraud (i.e. fabricated or falsified results)',
'Pressure to publish for career advancement',
'Insufficient oversight/mentoring by lab principal investigator (e.g. reviewing raw data)',
'Insufficient peer review of research',
'Selective reporting of results',
'Original findings not robust enough because not replicated enough in the lab publishing the work',
'Original findings obtained with low statistical power/poor statistical analysis',
'Mistakes or inadequate expertise in reproduction efforts',
'Raw data not available from original lab',
'Protocols, computer code or reagent information insufficient or not available from original lab',
'Methods need 'green fingers' – particular technical expertise that is difficult for others to reproduce',
'Variability of standard reagents', 'Poor experimental design',
'Bad luck'],
dtype='object')
而且我想使用列来做融化功能,所以我做了下面的代码。
data_melt = pd.melt(data, id_vars =['respid'], value_vars =['Fraud (i.e. fabricated or falsified results)',
'Pressure to publish for career advancement',
'Insufficient oversight/mentoring by lab principal investigator (e.g. reviewing raw data)',
'Insufficient peer review of research',
'Selective reporting of results',
'Original findings not robust enough because not replicated enough in the lab publishing the work',
'Original findings obtained with low statistical power/poor statistical analysis',
'Mistakes or inadequate expertise in reproduction efforts',
'Raw data not available from original lab',
'Protocols, computer code or reagent information insufficient or not available from original lab',
"Methods need 'green fingers' – particular technical expertise that is difficult for others to reproduce",
'Variability of standard reagents',
'Poor experimental design','Bad luck'],var_name = 'factor', value_name = 'rate')
基本上,我只是将列名粘贴到 value_vars。
我的问题是是否可以编写代码来实现相同的目标?
例如,编写如下代码。 (我知道错了。)
data_melt = pd.melt(data, id_vars =['respid'], value_vars = data.columns(), ,var_name = 'factor', value_name = 'rate')
谢谢!
解决方法如下:
# Create a dummy dataframe with columns similar to yours.
df = pd.DataFrame({"respid": range(5),
"Fraud (i.e. fabricated or falsified results)": range(5,10),
'Pressure to publish for career advancement': range(10, 15),
'Insufficient oversight/mentoring by lab principal investigator (e.g. reviewing raw data)': range(15,20),
'Insufficient peer review of research': range(20,25)
})
pd.melt(df, id_vars =['respid'], value_vars=set(df.columns).difference(["respid"]))
结果是:
respid variable value
0 0 Fraud (i.e. fabricated or falsified results) 5
1 1 Fraud (i.e. fabricated or falsified results) 6
2 2 Fraud (i.e. fabricated or falsified results) 7
3 3 Fraud (i.e. fabricated or falsified results) 8
4 4 Fraud (i.e. fabricated or falsified results) 9
5 0 Insufficient peer review of research 20
6 1 Insufficient peer review of research 21
7 2 Insufficient peer review of research 22
8 3 Insufficient peer review of research 23
...
如果 data.columns[1:]
是您需要的 values_vars,您只需将其作为参数提供即可:
data_melt = pd.melt(data, id_vars =['respid'], value_vars = data.columns[1:], ,var_name = 'factor', value_name = 'rate')
我有一个包含以下列的数据集。
data.columns[1:]
Index(['Fraud (i.e. fabricated or falsified results)',
'Pressure to publish for career advancement',
'Insufficient oversight/mentoring by lab principal investigator (e.g. reviewing raw data)',
'Insufficient peer review of research',
'Selective reporting of results',
'Original findings not robust enough because not replicated enough in the lab publishing the work',
'Original findings obtained with low statistical power/poor statistical analysis',
'Mistakes or inadequate expertise in reproduction efforts',
'Raw data not available from original lab',
'Protocols, computer code or reagent information insufficient or not available from original lab',
'Methods need 'green fingers' – particular technical expertise that is difficult for others to reproduce',
'Variability of standard reagents', 'Poor experimental design',
'Bad luck'],
dtype='object')
而且我想使用列来做融化功能,所以我做了下面的代码。
data_melt = pd.melt(data, id_vars =['respid'], value_vars =['Fraud (i.e. fabricated or falsified results)',
'Pressure to publish for career advancement',
'Insufficient oversight/mentoring by lab principal investigator (e.g. reviewing raw data)',
'Insufficient peer review of research',
'Selective reporting of results',
'Original findings not robust enough because not replicated enough in the lab publishing the work',
'Original findings obtained with low statistical power/poor statistical analysis',
'Mistakes or inadequate expertise in reproduction efforts',
'Raw data not available from original lab',
'Protocols, computer code or reagent information insufficient or not available from original lab',
"Methods need 'green fingers' – particular technical expertise that is difficult for others to reproduce",
'Variability of standard reagents',
'Poor experimental design','Bad luck'],var_name = 'factor', value_name = 'rate')
基本上,我只是将列名粘贴到 value_vars。
我的问题是是否可以编写代码来实现相同的目标?
例如,编写如下代码。 (我知道错了。)
data_melt = pd.melt(data, id_vars =['respid'], value_vars = data.columns(), ,var_name = 'factor', value_name = 'rate')
谢谢!
解决方法如下:
# Create a dummy dataframe with columns similar to yours.
df = pd.DataFrame({"respid": range(5),
"Fraud (i.e. fabricated or falsified results)": range(5,10),
'Pressure to publish for career advancement': range(10, 15),
'Insufficient oversight/mentoring by lab principal investigator (e.g. reviewing raw data)': range(15,20),
'Insufficient peer review of research': range(20,25)
})
pd.melt(df, id_vars =['respid'], value_vars=set(df.columns).difference(["respid"]))
结果是:
respid variable value
0 0 Fraud (i.e. fabricated or falsified results) 5
1 1 Fraud (i.e. fabricated or falsified results) 6
2 2 Fraud (i.e. fabricated or falsified results) 7
3 3 Fraud (i.e. fabricated or falsified results) 8
4 4 Fraud (i.e. fabricated or falsified results) 9
5 0 Insufficient peer review of research 20
6 1 Insufficient peer review of research 21
7 2 Insufficient peer review of research 22
8 3 Insufficient peer review of research 23
...
如果 data.columns[1:]
是您需要的 values_vars,您只需将其作为参数提供即可:
data_melt = pd.melt(data, id_vars =['respid'], value_vars = data.columns[1:], ,var_name = 'factor', value_name = 'rate')