数据框转换 python 产品属性值

Question

我有一个 excel 文件，其中包含如下所示的产品。是否可以使用 python 将相同类型的属性对齐到同一列？

我有这个

category	name	1	2	3	4	5	6
board games	board game 1	kind	family	box color	red	weight	0.7
board games	board game 2	kind	card game	box color	blue	lenght	25
board games	board game 3	box color	green	weight	0.5	lenght	32

期望输出

category	name	kind	box color	weight	length
board games	board game1	family games	red	0.7
board games	board game2	card games	blue		25
board games	board game3		green	0.5	32

我还有一个有重复值的案例。

category	name	1	2	3	4	5	6	7	8	9	10
smartphones	samsung1	sim type	dual	color	green	ram	4GB	Storage	128GB	year	2021
smartphones	samsung2	sim type	dual	color	green	ram	4GB	storage	256GB	year	2021
smartphones	xiaomi3	sim type	dual	color	blue	ram	6GB	length	32mm	Storage	128GB

期望输出：

category	name	sim type	color	ram	Storage	length	year
smartphones	samsung1	dual	green	4GB	128GB	nan	2021
smartphones	samsung2	dual	green	4GB	256GB	nan	2021
smartphones	xiaomi3	dual	blue	6GB	128GB	32mm	nan

错误信息：

ValueError：索引包含重复条目，无法重塑

my code

the error

UPDATE_2:

索引包含重复条目，无法重塑

category	name	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20
smartphones	samsung1	sim type	dual	color	green	ram	4GB	Storage	128GB	year	2021	camera	yes	fingerprint	yes	accelometer	yes	gyroscope	yes	NFC	yes
smartphones	samsung2	sim type	dual	color	green	ram	4GB	storage	256GB	year	2021	camera	yes	fingerprint	yes
smartphones	xiaomi3	sim type	dual	color	blue	ram	6GB	length	32mm	Storage	128GB	camera	yes	fingerprint	yes

期望输出：

category	name	sim type	color	ram	Storage	length	year	camera	fingerprint	accelometer	gyroscope	NFC
smartphones	samsung1	dual	green	4GB	128GB	nan	2021	yes	yes	yes	yes	no
smartphones	samsung2	dual	green	4GB	256GB	nan	2021	yes	yes	nan	nan	nan
smartphones	xiaomi3	dual	blue	6GB	128GB	32mm	nan	yes	yes	nan	nan	nan

error 3

Answer 1

更新

要解决重复问题，您可以尝试：

out = df.set_index(['category', 'name'], append=True).melt(ignore_index=False)
m = out['variable'].astype(int) % 2 == 1
out = (pd.concat([out.loc[m, 'value'].rename('variable'),
                  out.loc[~m, 'value']], axis=1)
         .set_index('variable', append=True)['value']
         .unstack('variable').reset_index(['category', 'name'])
         .rename_axis(columns=None))

输出：

category	name	Storage	color	length	ram	sim type	year
smartphones	samsung1	128GB	green	nan	4GB	dual	2021
smartphones	samsung2	256GB	green	nan	4GB	dual	2021
smartphones	xiaomi3	128GB	blue	32mm	6GB	dual	nan

旧答案

您主要可以使用 stack 和 unstack 来重新排列您的数据帧：

# Separate odd (names) and even (values) columns
out = (pd.concat([df.iloc[:, 2::2].stack().droplevel(1),
                  df.iloc[:, 3::2].stack().droplevel(1)], axis=1)
         .set_index(0, append=True)[1].unstack(0))

out = pd.concat([df.iloc[:, :2], out], axis=1)

输出：

category	name	box color	kind	lenght	weight
board games	board game 1	red	family	nan	0.7
board games	board game 2	blue	card game	25	nan
board games	board game 3	green	nan	32	0.5

数据框转换 python 产品属性值

dataframe transformation python products atrributes values

python

attributes

product

pandas