如何处理重复列或空白列并在 python 数据框中添加重复列的数字?

How to handle duplicated or blank columns and add number in duplicated ones in python dataframe?

Dataframe 有 200 多个包含重复日期和空列的列

         weight height total  weight height total
          2019   2019   2019  2020   2020   2020
Species  jan1   jan1    ''    jan1   jan1    ''
cat      1.0    2.0     3     4.0    3.0     7
dog      3.0    4.0     9     4.0    5.0     9

我试过了:

[x for x in df.columns if df.columns.count(x) >1]

#error: 'MultiIndex' object has no attribute 'count'

df.stack(dropna=False)

#error: cannot reindex from a duplicate axis

Objective: 添加任何字符串值,如 'a.jan1' 重复列和空白列重命名为 a、b.... 等等。

需要以表格形式输出以供进一步处理和存储..


 class    year    Month    cat    dog   
 weight   2019    jan1     1       3    
 height   2019    jan1     2       4
 weight   2020    jan1     4       4
 height   2020    jan1     3       5

因此,给定以下数据框:

import pandas as pd

df = pd.DataFrame(
    {
        ("weight", 2019, "jan1"): {"cat": 1, "dog": 3},
        ("height", 2019, "jan1"): {"cat": 2, "dog": 4},
        ("total", 2019, ""): {"cat": 3, "dog": 9},
        ("weight", 2020, "jan1"): {"cat": 4, "dog": 4},
        ("height", 2020, "jan1"): {"cat": 3, "dog": 5},
        ("total", 2020, ""): {"cat": 7, "dog": 9},
    }
)
print(df)
# Outputs
    weight height total weight height total
      2019   2019  2019   2020   2020  2020
      jan1   jan1         jan1   jan1      
cat      1      2     3      4      3     7
dog      3      4     9      4      5     9

你可以试试这个:

# UNpivot the dataframe
new_df = df.reset_index().melt(
    id_vars=[("index", "", "")],
    value_vars=[
        ("weight", 2019, "jan1"),
        ("height", 2019, "jan1"),
        ("weight", 2020, "jan1"),
        ("height", 2020, "jan1"),
    ],
)
new_df.columns = ["species", "class", "year", "month", "value"]

# Make separate dataframes for "cats" and "dogs" and store them in a list
temp_dfs = []
for species in new_df["species"].unique():
    temp_df = new_df.loc[new_df["species"] == species, :]
    temp_df = temp_df.rename(columns={"value": species}).drop(columns="species")
    temp_dfs.append(temp_df)

# Merge "cats" and "dogs"
final_df = temp_dfs[0]
for temp_df in temp_dfs[1:]:
    final_df = pd.merge(final_df, temp_df, on=["class", "year", "month"], how="outer")

等等:

print(final_df)
# Output
    class  year month  cat  dog
0  weight  2019  jan1    1    3
1  height  2019  jan1    2    4
2  weight  2020  jan1    4    4
3  height  2020  jan1    3    5