重塑 pandas 数据框,使列中的项目成为新的列标题?
Reshape a pandas dataframe so that items in a column are new column titles?
我的数据文件必须在列中才能导出到 excel 因为它的大小,因此当我想在 python 中再次使用它时我必须转置它。我想旋转数据框。如何将左列中的项目设为新数据框的列标题?
import numpy as np
#example
example_raw = [["Date", "11/19/20","12/22/20","2/17/21","2/19/21"],
["Time","9:40:28","9:25:13","9:20:17","9:19:58"],
["ID", 101, 102, 206, 104],
["timestamp", "09:40:28:590","09:25:13:437","09:20:17:455","09:19:58:629"],
["SECOND", np.NaN, np.NaN, np.NaN, np.NaN],
[0, 4.69, 4.1, 7.17, 8.66],
[0.2, 4.67, 4.16, 7.17, 8.74],
[0.4, 4.66, 4.21, 7.17, 8.75],
[0.6, 4.66, 4.21, 7.17, 8.75],
[0.8, 4.64, 4.28, 7.16, 8.75]]
example_table = pd.DataFrame(example_raw,columns=["Unnamed: 0", "CURRENT","CURRENT.1", "CURRENT.2", "CURRENT.3"])
#Desired outcome
desired=[["11/19/20","9:40:28","101", "09:40:28:590",4.69,4.67,4.66,4.66, 4.64],
["12/22/20","9:25:13","102", "09:25:13:437",4.1,4.16,4.21,4.21, 4.28],
["2/17/21","9:20:17","206", "9:20:17:455",7.17,7.17,7.17,7.17,7.18]
]
desired_table = pd.DataFrame(desired,columns=["Date","Time", "ID","timestamp", "0","0.2","0.4","0.6","0.8"])```
This question response would be applicable if the seconds data was not in the way.
It gets me close, but not quite there.
```new_table_1 = example_table.set_index([example_table['Unnamed: 0'],example_table.groupby('Unnamed: 0').cumcount()]).drop('Unnamed: 0',1).unstack(1)```
你可以试试:
df_result = example_table.dropna().T
df_result.columns = df_result.iloc[0]
df_result = df_result.iloc[1:].rename_axis(None, axis=1).reset_index(drop=True)
结果:
print(df_result)
Date Time ID timestamp 0 0.2 0.4 0.6 0.8
0 11/19/20 9:40:28 101 09:40:28:590 4.69 4.67 4.66 4.66 4.64
1 12/22/20 9:25:13 102 09:25:13:437 4.1 4.16 4.21 4.21 4.28
2 2/17/21 9:20:17 206 09:20:17:455 7.17 7.17 7.17 7.17 7.16
3 2/19/21 9:19:58 104 09:19:58:629 8.66 8.74 8.75 8.75 8.75
我的数据文件必须在列中才能导出到 excel 因为它的大小,因此当我想在 python 中再次使用它时我必须转置它。我想旋转数据框。如何将左列中的项目设为新数据框的列标题?
import numpy as np
#example
example_raw = [["Date", "11/19/20","12/22/20","2/17/21","2/19/21"],
["Time","9:40:28","9:25:13","9:20:17","9:19:58"],
["ID", 101, 102, 206, 104],
["timestamp", "09:40:28:590","09:25:13:437","09:20:17:455","09:19:58:629"],
["SECOND", np.NaN, np.NaN, np.NaN, np.NaN],
[0, 4.69, 4.1, 7.17, 8.66],
[0.2, 4.67, 4.16, 7.17, 8.74],
[0.4, 4.66, 4.21, 7.17, 8.75],
[0.6, 4.66, 4.21, 7.17, 8.75],
[0.8, 4.64, 4.28, 7.16, 8.75]]
example_table = pd.DataFrame(example_raw,columns=["Unnamed: 0", "CURRENT","CURRENT.1", "CURRENT.2", "CURRENT.3"])
#Desired outcome
desired=[["11/19/20","9:40:28","101", "09:40:28:590",4.69,4.67,4.66,4.66, 4.64],
["12/22/20","9:25:13","102", "09:25:13:437",4.1,4.16,4.21,4.21, 4.28],
["2/17/21","9:20:17","206", "9:20:17:455",7.17,7.17,7.17,7.17,7.18]
]
desired_table = pd.DataFrame(desired,columns=["Date","Time", "ID","timestamp", "0","0.2","0.4","0.6","0.8"])```
This question response would be applicable if the seconds data was not in the way.
It gets me close, but not quite there.
```new_table_1 = example_table.set_index([example_table['Unnamed: 0'],example_table.groupby('Unnamed: 0').cumcount()]).drop('Unnamed: 0',1).unstack(1)```
你可以试试:
df_result = example_table.dropna().T
df_result.columns = df_result.iloc[0]
df_result = df_result.iloc[1:].rename_axis(None, axis=1).reset_index(drop=True)
结果:
print(df_result)
Date Time ID timestamp 0 0.2 0.4 0.6 0.8
0 11/19/20 9:40:28 101 09:40:28:590 4.69 4.67 4.66 4.66 4.64
1 12/22/20 9:25:13 102 09:25:13:437 4.1 4.16 4.21 4.21 4.28
2 2/17/21 9:20:17 206 09:20:17:455 7.17 7.17 7.17 7.17 7.16
3 2/19/21 9:19:58 104 09:19:58:629 8.66 8.74 8.75 8.75 8.75