数据框转换 python 产品属性值
dataframe transformation python products atrributes values
我有一个 excel 文件,其中包含如下所示的产品。是否可以使用 python 将相同类型的属性对齐到同一列?
我有这个
category
name
1
2
3
4
5
6
board games
board game 1
kind
family
box color
red
weight
0.7
board games
board game 2
kind
card game
box color
blue
lenght
25
board games
board game 3
box color
green
weight
0.5
lenght
32
期望输出
category
name
kind
box color
weight
length
board games
board game1
family games
red
0.7
board games
board game2
card games
blue
25
board games
board game3
green
0.5
32
我还有一个有重复值的案例。
category
name
1
2
3
4
5
6
7
8
9
10
smartphones
samsung1
sim type
dual
color
green
ram
4GB
Storage
128GB
year
2021
smartphones
samsung2
sim type
dual
color
green
ram
4GB
storage
256GB
year
2021
smartphones
xiaomi3
sim type
dual
color
blue
ram
6GB
length
32mm
Storage
128GB
期望输出:
category
name
sim type
color
ram
Storage
length
year
smartphones
samsung1
dual
green
4GB
128GB
nan
2021
smartphones
samsung2
dual
green
4GB
256GB
nan
2021
smartphones
xiaomi3
dual
blue
6GB
128GB
32mm
nan
错误信息:
ValueError:索引包含重复条目,无法重塑
my code
the error
UPDATE_2:
索引包含重复条目,无法重塑
category
name
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
smartphones
samsung1
sim type
dual
color
green
ram
4GB
Storage
128GB
year
2021
camera
yes
fingerprint
yes
accelometer
yes
gyroscope
yes
NFC
yes
smartphones
samsung2
sim type
dual
color
green
ram
4GB
storage
256GB
year
2021
camera
yes
fingerprint
yes
smartphones
xiaomi3
sim type
dual
color
blue
ram
6GB
length
32mm
Storage
128GB
camera
yes
fingerprint
yes
期望输出:
category
name
sim type
color
ram
Storage
length
year
camera
fingerprint
accelometer
gyroscope
NFC
smartphones
samsung1
dual
green
4GB
128GB
nan
2021
yes
yes
yes
yes
no
smartphones
samsung2
dual
green
4GB
256GB
nan
2021
yes
yes
nan
nan
nan
smartphones
xiaomi3
dual
blue
6GB
128GB
32mm
nan
yes
yes
nan
nan
nan
error 3
更新
要解决重复问题,您可以尝试:
out = df.set_index(['category', 'name'], append=True).melt(ignore_index=False)
m = out['variable'].astype(int) % 2 == 1
out = (pd.concat([out.loc[m, 'value'].rename('variable'),
out.loc[~m, 'value']], axis=1)
.set_index('variable', append=True)['value']
.unstack('variable').reset_index(['category', 'name'])
.rename_axis(columns=None))
输出:
category
name
Storage
color
length
ram
sim type
year
smartphones
samsung1
128GB
green
nan
4GB
dual
2021
smartphones
samsung2
256GB
green
nan
4GB
dual
2021
smartphones
xiaomi3
128GB
blue
32mm
6GB
dual
nan
旧答案
您主要可以使用 stack
和 unstack
来重新排列您的数据帧:
# Separate odd (names) and even (values) columns
out = (pd.concat([df.iloc[:, 2::2].stack().droplevel(1),
df.iloc[:, 3::2].stack().droplevel(1)], axis=1)
.set_index(0, append=True)[1].unstack(0))
out = pd.concat([df.iloc[:, :2], out], axis=1)
输出:
category
name
box color
kind
lenght
weight
board games
board game 1
red
family
nan
0.7
board games
board game 2
blue
card game
25
nan
board games
board game 3
green
nan
32
0.5
我有一个 excel 文件,其中包含如下所示的产品。是否可以使用 python 将相同类型的属性对齐到同一列?
我有这个
category | name | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|---|
board games | board game 1 | kind | family | box color | red | weight | 0.7 |
board games | board game 2 | kind | card game | box color | blue | lenght | 25 |
board games | board game 3 | box color | green | weight | 0.5 | lenght | 32 |
期望输出
category | name | kind | box color | weight | length |
---|---|---|---|---|---|
board games | board game1 | family games | red | 0.7 | |
board games | board game2 | card games | blue | 25 | |
board games | board game3 | green | 0.5 | 32 |
我还有一个有重复值的案例。
category | name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | sim type | dual | color | green | ram | 4GB | Storage | 128GB | year | 2021 |
smartphones | samsung2 | sim type | dual | color | green | ram | 4GB | storage | 256GB | year | 2021 |
smartphones | xiaomi3 | sim type | dual | color | blue | ram | 6GB | length | 32mm | Storage | 128GB |
期望输出:
category | name | sim type | color | ram | Storage | length | year |
---|---|---|---|---|---|---|---|
smartphones | samsung1 | dual | green | 4GB | 128GB | nan | 2021 |
smartphones | samsung2 | dual | green | 4GB | 256GB | nan | 2021 |
smartphones | xiaomi3 | dual | blue | 6GB | 128GB | 32mm | nan |
错误信息:
ValueError:索引包含重复条目,无法重塑
my code
the error
UPDATE_2:
索引包含重复条目,无法重塑
category | name | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | sim type | dual | color | green | ram | 4GB | Storage | 128GB | year | 2021 | camera | yes | fingerprint | yes | accelometer | yes | gyroscope | yes | NFC | yes |
smartphones | samsung2 | sim type | dual | color | green | ram | 4GB | storage | 256GB | year | 2021 | camera | yes | fingerprint | yes | ||||||
smartphones | xiaomi3 | sim type | dual | color | blue | ram | 6GB | length | 32mm | Storage | 128GB | camera | yes | fingerprint | yes |
期望输出:
category | name | sim type | color | ram | Storage | length | year | camera | fingerprint | accelometer | gyroscope | NFC |
---|---|---|---|---|---|---|---|---|---|---|---|---|
smartphones | samsung1 | dual | green | 4GB | 128GB | nan | 2021 | yes | yes | yes | yes | no |
smartphones | samsung2 | dual | green | 4GB | 256GB | nan | 2021 | yes | yes | nan | nan | nan |
smartphones | xiaomi3 | dual | blue | 6GB | 128GB | 32mm | nan | yes | yes | nan | nan | nan |
error 3
更新
要解决重复问题,您可以尝试:
out = df.set_index(['category', 'name'], append=True).melt(ignore_index=False)
m = out['variable'].astype(int) % 2 == 1
out = (pd.concat([out.loc[m, 'value'].rename('variable'),
out.loc[~m, 'value']], axis=1)
.set_index('variable', append=True)['value']
.unstack('variable').reset_index(['category', 'name'])
.rename_axis(columns=None))
输出:
category | name | Storage | color | length | ram | sim type | year |
---|---|---|---|---|---|---|---|
smartphones | samsung1 | 128GB | green | nan | 4GB | dual | 2021 |
smartphones | samsung2 | 256GB | green | nan | 4GB | dual | 2021 |
smartphones | xiaomi3 | 128GB | blue | 32mm | 6GB | dual | nan |
旧答案
您主要可以使用 stack
和 unstack
来重新排列您的数据帧:
# Separate odd (names) and even (values) columns
out = (pd.concat([df.iloc[:, 2::2].stack().droplevel(1),
df.iloc[:, 3::2].stack().droplevel(1)], axis=1)
.set_index(0, append=True)[1].unstack(0))
out = pd.concat([df.iloc[:, :2], out], axis=1)
输出:
category | name | box color | kind | lenght | weight |
---|---|---|---|---|---|
board games | board game 1 | red | family | nan | 0.7 |
board games | board game 2 | blue | card game | 25 | nan |
board games | board game 3 | green | nan | 32 | 0.5 |