从宽到长的多行且只有两个变量

Question

我一直在搜索，但没有找到答案。我有下一个数据框

                       Pais  Anio Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad Electricidad
0                    NaN   NaN           No           No           No           No           No           Si           Si           Si           Si           Si        Total
1                    NaN   NaN        Rural        Rural       Urbana       Urbana     Total No        Rural        Rural       Urbana       Urbana     Total Si     Total Si
2                    NaN   NaN       Hombre        Mujer       Hombre        Mujer          NaN       Hombre        Mujer       Hombre        Mujer          NaN          NaN
3              Argentina  2015          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN          NaN
4                Bolivia  2014       513160       462745        24457        25959      1026321      1187340      1243921      3554853      3686894      9673008     10699329
5                 Brasil  2015       287373       216447        28718        15895       548433     15898153     14545231     81355185     88432517    200231086    200779519
6                  Chile  2011        20192        16702         8752         7090        52736      1054604      1053353      6936960      7749581     16794498     16847234

这是期望的输出：

我将所需的输出放在图像中，因为它有很多数据，我设法只得到一个融化，但我还需要 "melt" Yes/No、区域和性别行...

我的代码是这样的：

df1 = df.iloc[0:3] # select three first rows
df1 = df1.ffill(axis='columns') #Rellenando los grupos 
df = df.iloc[3:]
# my_dataframe = my_dataframe[my_dataframe.employee_name != 'chad']
df1 = df1.append(df) 
# moviendo primera fila a titulo de columna
df1.columns = df1.iloc[0]
df1 = df1.reindex(df1.index.drop(0)).reset_index(drop=True)
df1.columns.name = None
df1 = pd.melt(df1, id_vars=["Pais", "Anio"], var_name="Pregunta")

对此的建议和建议表示赞赏！！！

Answer 1

您的解决方案可能使用：

df.iloc[:3] = df.iloc[:3].ffill(axis='columns') #Rellenando los grupos 

#MultiIndex by columns and first 3 rows
df.columns = [df.columns, 
              df.iloc[0], 
              df.iloc[1], 
              df.iloc[2]]


df = (df.iloc[3:] #remove first 3 rows
        .set_index(df.columns.tolist()[:2]) #first 2 cols to MultiIndex
        .rename_axis(['Pais','Anio']) #removed tuples names
        .unstack([0,1]) #reshape
        .rename_axis(['Pregunta','Respuesta','Zona','Sexo','Pais','Anio']) #levels names
        .sort_index(level=['Pais','Anio']) #sorting levels
        .reset_index(name='Total') #Series to DataFrame
        .dropna(subset=['Anio']) #removed NaNs if in Anio column
        .assign(Anio = lambda x: x['Anio'].astype(int)) #Convert Anio to int
        .reindex(['Pais','Anio','Pregunta','Zona','Sexo','Respuesta','Total'],1) #order
    .dropna(subset=['Total']) #removed NaNs by Total column
    .assign(Total = lambda x: x['Total'].astype(int)) #convert Total to ints
    )
print (df.head(10))
       Pais  Anio      Pregunta      Zona    Sexo Respuesta    Total
44  Bolivia  2014  Electricidad     Rural  Hombre        No   513160
45  Bolivia  2014  Electricidad     Rural   Mujer        No   462745
46  Bolivia  2014  Electricidad  Total No   Mujer        No  1026321
47  Bolivia  2014  Electricidad    Urbana  Hombre        No    24457
48  Bolivia  2014  Electricidad    Urbana   Mujer        No    25959
49  Bolivia  2014  Electricidad     Rural  Hombre        Si  1187340
50  Bolivia  2014  Electricidad     Rural   Mujer        Si  1243921
51  Bolivia  2014  Electricidad  Total Si   Mujer        Si  9673008
52  Bolivia  2014  Electricidad    Urbana  Hombre        Si  3554853
53  Bolivia  2014  Electricidad    Urbana   Mujer        Si  3686894

从宽到长的多行且只有两个变量

Wide to long multiple rows and only two variables

python

reshape

melt

pandas

wide-column-store