转置数据框问题:对于每个 df.index 和 df.column 组合在新数据框中创建一行
Transpose Dataframe problem: For each df.index and df.column combination create a row in new dataframe
我有一个如下所示的数据框:
我的数据框的索引是“日期”列。
Dates 3M INDIA LTD ALKYL AMINES CHEMICALS LTD AAVAS FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA CAPITAL LTD
01-01-2020 1.738819 -0.054496 -0.600676 -0.535873 -1.837524 0.514004 -0.853701 -0.101420 2.192982
02-01-2020 -1.110939 3.668744 1.371749 1.346907 4.367026 2.930212 3.540222 4.080081 1.185880
03-01-2020 -0.862856 0.008598 2.543608 2.104247 0.795136 -0.290943 -0.726246 -1.021898 1.368421
06-01-2020 -2.135963 -1.952790 -2.201474 -2.643822 -4.166667 -2.250709 -1.815881 -2.933202 0.300000
07-01-2020 1.692019 8.431578 -1.116379 0.674114 0.097800 -3.166751 0.677638 -1.873767 0.837922
我想创建一个新的数据框,这样对于每个日期和公司名称组合,我将在数据框中有 1 行。
生成的数据框将如下所示:
日期公司名称值
如何使用 python pandas 操作实现此转换?
您可以使用 pandas
中的 pd.melt
并重塑您的数据集。假设您的数据框称为 df
,请使用以下内容:
import pandas as pd
df_reshaped = pd.melt(df,id_vars='Dates 3M')
这会给你:
df_reshaped
Out[13]:
Dates 3M variable value
0 2020-01-01 INDIA LTD 1.739
1 2020-02-01 INDIA LTD -1.111
2 2020-03-01 INDIA LTD -0.863
3 2020-06-01 INDIA LTD -2.136
4 2020-07-01 INDIA LTD 1.692
5 2020-01-01 ALKYL AMINES CHEMICALS -0.655
6 2020-02-01 ALKYL AMINES CHEMICALS 3.668744 1.371749
7 2020-03-01 ALKYL AMINES CHEMICALS 0.008598 2.543608
8 2020-06-01 ALKYL AMINES CHEMICALS -4.154
9 2020-07-01 ALKYL AMINES CHEMICALS 8.431578 -1.116379
10 2020-01-01 LTD AAVAS -0.536
11 2020-02-01 LTD AAVAS 1.347
12 2020-03-01 LTD AAVAS 2.104
13 2020-06-01 LTD AAVAS -2.644
14 2020-07-01 LTD AAVAS 0.674
15 2020-01-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... -1.837524 0.514004 -0.853701 -0.101420 ...
16 2020-02-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... 4.367026 2.930212 3.540222 4.080081 ...
17 2020-03-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... 0.795136 -0.290943 -0.726246 -1.021898 ...
18 2020-06-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... -4.166667 -2.250709 -1.815881 -2.933202 ...
19 2020-07-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... 0.097800 -3.166751 0.677638 -1.873767 ...
请注意,您可以 rename
通过在上面的代码中添加新创建的列:
df_reshaped = pd.melt(df,id_vars='Dates 3M',var_name = "newname1", value_name = "newname2")
df = df.set_index('Dates').stack().reset_index()
df.columns = ['Dates','Company Name','Value']
df.sort_values(by=['Company Name', 'Dates'])
或者
pd.melt(df,
id_vars=['Dates'],
value_vars=[x for x in df.columns if x!='Dates'],
var_name='Company Name',
value_name='Values')
我有一个如下所示的数据框:
我的数据框的索引是“日期”列。
Dates 3M INDIA LTD ALKYL AMINES CHEMICALS LTD AAVAS FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA CAPITAL LTD
01-01-2020 1.738819 -0.054496 -0.600676 -0.535873 -1.837524 0.514004 -0.853701 -0.101420 2.192982
02-01-2020 -1.110939 3.668744 1.371749 1.346907 4.367026 2.930212 3.540222 4.080081 1.185880
03-01-2020 -0.862856 0.008598 2.543608 2.104247 0.795136 -0.290943 -0.726246 -1.021898 1.368421
06-01-2020 -2.135963 -1.952790 -2.201474 -2.643822 -4.166667 -2.250709 -1.815881 -2.933202 0.300000
07-01-2020 1.692019 8.431578 -1.116379 0.674114 0.097800 -3.166751 0.677638 -1.873767 0.837922
我想创建一个新的数据框,这样对于每个日期和公司名称组合,我将在数据框中有 1 行。
生成的数据框将如下所示:
日期公司名称值
如何使用 python pandas 操作实现此转换?
您可以使用 pandas
中的 pd.melt
并重塑您的数据集。假设您的数据框称为 df
,请使用以下内容:
import pandas as pd
df_reshaped = pd.melt(df,id_vars='Dates 3M')
这会给你:
df_reshaped
Out[13]:
Dates 3M variable value
0 2020-01-01 INDIA LTD 1.739
1 2020-02-01 INDIA LTD -1.111
2 2020-03-01 INDIA LTD -0.863
3 2020-06-01 INDIA LTD -2.136
4 2020-07-01 INDIA LTD 1.692
5 2020-01-01 ALKYL AMINES CHEMICALS -0.655
6 2020-02-01 ALKYL AMINES CHEMICALS 3.668744 1.371749
7 2020-03-01 ALKYL AMINES CHEMICALS 0.008598 2.543608
8 2020-06-01 ALKYL AMINES CHEMICALS -4.154
9 2020-07-01 ALKYL AMINES CHEMICALS 8.431578 -1.116379
10 2020-01-01 LTD AAVAS -0.536
11 2020-02-01 LTD AAVAS 1.347
12 2020-03-01 LTD AAVAS 2.104
13 2020-06-01 LTD AAVAS -2.644
14 2020-07-01 LTD AAVAS 0.674
15 2020-01-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... -1.837524 0.514004 -0.853701 -0.101420 ...
16 2020-02-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... 4.367026 2.930212 3.540222 4.080081 ...
17 2020-03-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... 0.795136 -0.290943 -0.726246 -1.021898 ...
18 2020-06-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... -4.166667 -2.250709 -1.815881 -2.933202 ...
19 2020-07-01 FINANCIERS LTD ABB INDIA LTD ADITYA BIRLA... 0.097800 -3.166751 0.677638 -1.873767 ...
请注意,您可以 rename
通过在上面的代码中添加新创建的列:
df_reshaped = pd.melt(df,id_vars='Dates 3M',var_name = "newname1", value_name = "newname2")
df = df.set_index('Dates').stack().reset_index()
df.columns = ['Dates','Company Name','Value']
df.sort_values(by=['Company Name', 'Dates'])
或者
pd.melt(df,
id_vars=['Dates'],
value_vars=[x for x in df.columns if x!='Dates'],
var_name='Company Name',
value_name='Values')