如何处理重复列或空白列并在 python 数据框中添加重复列的数字?
How to handle duplicated or blank columns and add number in duplicated ones in python dataframe?
Dataframe 有 200 多个包含重复日期和空列的列
weight height total weight height total
2019 2019 2019 2020 2020 2020
Species jan1 jan1 '' jan1 jan1 ''
cat 1.0 2.0 3 4.0 3.0 7
dog 3.0 4.0 9 4.0 5.0 9
我试过了:
[x for x in df.columns if df.columns.count(x) >1]
#error: 'MultiIndex' object has no attribute 'count'
df.stack(dropna=False)
#error: cannot reindex from a duplicate axis
Objective:
添加任何字符串值,如 'a.jan1' 重复列和空白列重命名为 a、b.... 等等。
需要以表格形式输出以供进一步处理和存储..
class year Month cat dog
weight 2019 jan1 1 3
height 2019 jan1 2 4
weight 2020 jan1 4 4
height 2020 jan1 3 5
因此,给定以下数据框:
import pandas as pd
df = pd.DataFrame(
{
("weight", 2019, "jan1"): {"cat": 1, "dog": 3},
("height", 2019, "jan1"): {"cat": 2, "dog": 4},
("total", 2019, ""): {"cat": 3, "dog": 9},
("weight", 2020, "jan1"): {"cat": 4, "dog": 4},
("height", 2020, "jan1"): {"cat": 3, "dog": 5},
("total", 2020, ""): {"cat": 7, "dog": 9},
}
)
print(df)
# Outputs
weight height total weight height total
2019 2019 2019 2020 2020 2020
jan1 jan1 jan1 jan1
cat 1 2 3 4 3 7
dog 3 4 9 4 5 9
你可以试试这个:
# UNpivot the dataframe
new_df = df.reset_index().melt(
id_vars=[("index", "", "")],
value_vars=[
("weight", 2019, "jan1"),
("height", 2019, "jan1"),
("weight", 2020, "jan1"),
("height", 2020, "jan1"),
],
)
new_df.columns = ["species", "class", "year", "month", "value"]
# Make separate dataframes for "cats" and "dogs" and store them in a list
temp_dfs = []
for species in new_df["species"].unique():
temp_df = new_df.loc[new_df["species"] == species, :]
temp_df = temp_df.rename(columns={"value": species}).drop(columns="species")
temp_dfs.append(temp_df)
# Merge "cats" and "dogs"
final_df = temp_dfs[0]
for temp_df in temp_dfs[1:]:
final_df = pd.merge(final_df, temp_df, on=["class", "year", "month"], how="outer")
等等:
print(final_df)
# Output
class year month cat dog
0 weight 2019 jan1 1 3
1 height 2019 jan1 2 4
2 weight 2020 jan1 4 4
3 height 2020 jan1 3 5
Dataframe 有 200 多个包含重复日期和空列的列
weight height total weight height total
2019 2019 2019 2020 2020 2020
Species jan1 jan1 '' jan1 jan1 ''
cat 1.0 2.0 3 4.0 3.0 7
dog 3.0 4.0 9 4.0 5.0 9
我试过了:
[x for x in df.columns if df.columns.count(x) >1]
#error: 'MultiIndex' object has no attribute 'count'
df.stack(dropna=False)
#error: cannot reindex from a duplicate axis
Objective: 添加任何字符串值,如 'a.jan1' 重复列和空白列重命名为 a、b.... 等等。
需要以表格形式输出以供进一步处理和存储..
class year Month cat dog
weight 2019 jan1 1 3
height 2019 jan1 2 4
weight 2020 jan1 4 4
height 2020 jan1 3 5
因此,给定以下数据框:
import pandas as pd
df = pd.DataFrame(
{
("weight", 2019, "jan1"): {"cat": 1, "dog": 3},
("height", 2019, "jan1"): {"cat": 2, "dog": 4},
("total", 2019, ""): {"cat": 3, "dog": 9},
("weight", 2020, "jan1"): {"cat": 4, "dog": 4},
("height", 2020, "jan1"): {"cat": 3, "dog": 5},
("total", 2020, ""): {"cat": 7, "dog": 9},
}
)
print(df)
# Outputs
weight height total weight height total
2019 2019 2019 2020 2020 2020
jan1 jan1 jan1 jan1
cat 1 2 3 4 3 7
dog 3 4 9 4 5 9
你可以试试这个:
# UNpivot the dataframe
new_df = df.reset_index().melt(
id_vars=[("index", "", "")],
value_vars=[
("weight", 2019, "jan1"),
("height", 2019, "jan1"),
("weight", 2020, "jan1"),
("height", 2020, "jan1"),
],
)
new_df.columns = ["species", "class", "year", "month", "value"]
# Make separate dataframes for "cats" and "dogs" and store them in a list
temp_dfs = []
for species in new_df["species"].unique():
temp_df = new_df.loc[new_df["species"] == species, :]
temp_df = temp_df.rename(columns={"value": species}).drop(columns="species")
temp_dfs.append(temp_df)
# Merge "cats" and "dogs"
final_df = temp_dfs[0]
for temp_df in temp_dfs[1:]:
final_df = pd.merge(final_df, temp_df, on=["class", "year", "month"], how="outer")
等等:
print(final_df)
# Output
class year month cat dog
0 weight 2019 jan1 1 3
1 height 2019 jan1 2 4
2 weight 2020 jan1 4 4
3 height 2020 jan1 3 5