数据框转换 python 产品属性值

dataframe transformation python products atrributes values

我有一个 excel 文件,其中包含如下所示的产品。是否可以使用 python 将相同类型的属性对齐到同一列?

我有这个

category name 1 2 3 4 5 6
board games board game 1 kind family box color red weight 0.7
board games board game 2 kind card game box color blue lenght 25
board games board game 3 box color green weight 0.5 lenght 32

期望输出

category name kind box color weight length
board games board game1 family games red 0.7
board games board game2 card games blue 25
board games board game3 green 0.5 32

我还有一个有重复值的案例。

category name 1 2 3 4 5 6 7 8 9 10
smartphones samsung1 sim type dual color green ram 4GB Storage 128GB year 2021
smartphones samsung2 sim type dual color green ram 4GB storage 256GB year 2021
smartphones xiaomi3 sim type dual color blue ram 6GB length 32mm Storage 128GB

期望输出:

category name sim type color ram Storage length year
smartphones samsung1 dual green 4GB 128GB nan 2021
smartphones samsung2 dual green 4GB 256GB nan 2021
smartphones xiaomi3 dual blue 6GB 128GB 32mm nan

错误信息:

ValueError:索引包含重复条目,无法重塑

my code

the error

UPDATE_2:

索引包含重复条目,无法重塑

category name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
smartphones samsung1 sim type dual color green ram 4GB Storage 128GB year 2021 camera yes fingerprint yes accelometer yes gyroscope yes NFC yes
smartphones samsung2 sim type dual color green ram 4GB storage 256GB year 2021 camera yes fingerprint yes
smartphones xiaomi3 sim type dual color blue ram 6GB length 32mm Storage 128GB camera yes fingerprint yes

期望输出:

category name sim type color ram Storage length year camera fingerprint accelometer gyroscope NFC
smartphones samsung1 dual green 4GB 128GB nan 2021 yes yes yes yes no
smartphones samsung2 dual green 4GB 256GB nan 2021 yes yes nan nan nan
smartphones xiaomi3 dual blue 6GB 128GB 32mm nan yes yes nan nan nan

error 3

更新

要解决重复问题,您可以尝试:

out = df.set_index(['category', 'name'], append=True).melt(ignore_index=False)
m = out['variable'].astype(int) % 2 == 1
out = (pd.concat([out.loc[m, 'value'].rename('variable'),
                  out.loc[~m, 'value']], axis=1)
         .set_index('variable', append=True)['value']
         .unstack('variable').reset_index(['category', 'name'])
         .rename_axis(columns=None))

输出:

category name Storage color length ram sim type year
smartphones samsung1 128GB green nan 4GB dual 2021
smartphones samsung2 256GB green nan 4GB dual 2021
smartphones xiaomi3 128GB blue 32mm 6GB dual nan

旧答案

您主要可以使用 stackunstack 来重新排列您的数据帧:

# Separate odd (names) and even (values) columns
out = (pd.concat([df.iloc[:, 2::2].stack().droplevel(1),
                  df.iloc[:, 3::2].stack().droplevel(1)], axis=1)
         .set_index(0, append=True)[1].unstack(0))

out = pd.concat([df.iloc[:, :2], out], axis=1)

输出:

category name box color kind lenght weight
board games board game 1 red family nan 0.7
board games board game 2 blue card game 25 nan
board games board game 3 green nan 32 0.5