如何使用 .isin() 和列表在 Pandas 中的多列中创建单列?
How to Create a Single Column off multiple columns in Pandas using .isin() and a list?
我把一个更复杂的问题分解成了更简单的问题。实际问题有更大的列表和更多的列。
从这个 df 开始:
i | COL1 | COL2 | COL3 | COL4 | Revenue | QTY | Products
0 | Coin | Gold Krug | Gold Coin | Coins | 2333677473 | 21 | 12
1 | Gold Coin | Coins | Gold Coin | Coins | 2564774784 | 28 | 14
2 | Gold Coin | Coins | Gold Krug | Coins |3256666647 | 35 | 16
3 |Gold Coin | Coins | Coins |Gold Krug | 3456788 | 42 | 18
4 |Gold Krug | Gold Coin | Coins | Coins | 4588960 | 49 | 20
5 |Gold Coin | Coins | Gold Krug | Coins |346869909 |56 | 22
6 |Gold Coin | Coins | Gold Coin | Coins | 3777989 |63 | 24
7 |Gold Coin |Silver Krug |Gold Coin | Coins | 37687589 |70 | 26
8 |Gold Coin | Coins |Gold Coin | Coins | 45789889 |77 | 28
9 |Gold Coin | Gold Krug |Gold Coin |Coins | 468 |84 | 30
我希望输出为 DF,并带有如下新列:
i | Category | Revenue | QTY |Products
0 |Gold Krug | 2333677473 |21 | 12
2 |Gold Krug | 3256666647 | 35 | 16
3 |Gold Krug | 3456788 | 42 | 18
4 | Gold Krug | 4588960 | 49 | 20
5 | Gold Krug | 346869909 | 56 | 22
7 | Silver Krug | 37687589 | 70 | 26
9 | Gold Krug | 468 | 84 | 30
我用过这个,但就是不明白如何使用列表中与新列匹配的值创建新列:
KRUG = ['Gold Krug', 'Silver Krug', 'Gold Maple','Gold Eagle']
df = df[df[['COL1', 'COL2', 'COL3', 'COL4 ']].isin(KRUG).any(axis=1)]
print(df)
output :
i |COL1 |COL2 |COL3 |COL4 |Revenue |QTY |Products
0 |Coin |Gold Krug |Gold Coin |Coins |2333677473 |21 |12
2 |Gold Coin |Coins |Gold Krug |Coins |3256666647 |35 |16
3 |Gold Coin |Coins |Coins |Gold Krug |3456788 |42 |18
4 |Gold Krug |Gold Coin |Coins |Coins |4588960 |49 |20
5 |Gold Coin |Coins |Gold Krug |Coins |346869909 |56 |22
7 |Gold Coin |Silver Krug |Gold Coin |Coins |37687589 |70 |26
9 |Gold Coin |Gold Krug |Gold Coin |Coins |468 |84 |30
这是一个使用 apply() 的方法,尽管使用 .str 应该有更简单的方法。如果它不是太大的数据库,这应该没问题。
import numpy as np
def get_coin(x):
for k in KRUG:
if k in x.tolist():
return k
return np.nan
df['category'] = df[['COL1', 'COL2', 'COL3', 'COL4']].apply(get_coin, axis=1)
df.drop(['COL1', 'COL2', 'COL3', 'COL4'], axis=1, inplace=True)
df.dropna(inplace=True)
i Revenue QTY Products category
0 0 2333677473 21 12 Gold Krug
2 2 3256666647 35 16 Gold Krug
3 3 3456788 42 18 Gold Krug
4 4 4588960 49 20 Gold Krug
5 5 346869909 56 22 Gold Krug
7 7 37687589 70 26 Silver Krug
将搜索分成两部分,然后连接:
category = (df.filter(like='COL')
.agg(','.join, axis = 1)
.str.extract(fr"({'|'.join(KRUG)})")
.dropna()
.set_axis(['category'], axis = 'columns')
)
others = df.loc[df.filter(like='COL').isin(KRUG).any(1),
['Revenue', 'QTY', 'Products']]
pd.concat([category, others], axis = 'columns')
category Revenue QTY Products
0 Gold Krug 2333677473 21 12
2 Gold Krug 3256666647 35 16
3 Gold Krug 3456788 42 18
4 Gold Krug 4588960 49 20
5 Gold Krug 346869909 56 22
7 Silver Krug 37687589 70 26
9 Gold Krug 468 84 30
我把一个更复杂的问题分解成了更简单的问题。实际问题有更大的列表和更多的列。
从这个 df 开始:
i | COL1 | COL2 | COL3 | COL4 | Revenue | QTY | Products
0 | Coin | Gold Krug | Gold Coin | Coins | 2333677473 | 21 | 12
1 | Gold Coin | Coins | Gold Coin | Coins | 2564774784 | 28 | 14
2 | Gold Coin | Coins | Gold Krug | Coins |3256666647 | 35 | 16
3 |Gold Coin | Coins | Coins |Gold Krug | 3456788 | 42 | 18
4 |Gold Krug | Gold Coin | Coins | Coins | 4588960 | 49 | 20
5 |Gold Coin | Coins | Gold Krug | Coins |346869909 |56 | 22
6 |Gold Coin | Coins | Gold Coin | Coins | 3777989 |63 | 24
7 |Gold Coin |Silver Krug |Gold Coin | Coins | 37687589 |70 | 26
8 |Gold Coin | Coins |Gold Coin | Coins | 45789889 |77 | 28
9 |Gold Coin | Gold Krug |Gold Coin |Coins | 468 |84 | 30
我希望输出为 DF,并带有如下新列:
i | Category | Revenue | QTY |Products
0 |Gold Krug | 2333677473 |21 | 12
2 |Gold Krug | 3256666647 | 35 | 16
3 |Gold Krug | 3456788 | 42 | 18
4 | Gold Krug | 4588960 | 49 | 20
5 | Gold Krug | 346869909 | 56 | 22
7 | Silver Krug | 37687589 | 70 | 26
9 | Gold Krug | 468 | 84 | 30
我用过这个,但就是不明白如何使用列表中与新列匹配的值创建新列:
KRUG = ['Gold Krug', 'Silver Krug', 'Gold Maple','Gold Eagle']
df = df[df[['COL1', 'COL2', 'COL3', 'COL4 ']].isin(KRUG).any(axis=1)]
print(df)
output :
i |COL1 |COL2 |COL3 |COL4 |Revenue |QTY |Products
0 |Coin |Gold Krug |Gold Coin |Coins |2333677473 |21 |12
2 |Gold Coin |Coins |Gold Krug |Coins |3256666647 |35 |16
3 |Gold Coin |Coins |Coins |Gold Krug |3456788 |42 |18
4 |Gold Krug |Gold Coin |Coins |Coins |4588960 |49 |20
5 |Gold Coin |Coins |Gold Krug |Coins |346869909 |56 |22
7 |Gold Coin |Silver Krug |Gold Coin |Coins |37687589 |70 |26
9 |Gold Coin |Gold Krug |Gold Coin |Coins |468 |84 |30
这是一个使用 apply() 的方法,尽管使用 .str 应该有更简单的方法。如果它不是太大的数据库,这应该没问题。
import numpy as np
def get_coin(x):
for k in KRUG:
if k in x.tolist():
return k
return np.nan
df['category'] = df[['COL1', 'COL2', 'COL3', 'COL4']].apply(get_coin, axis=1)
df.drop(['COL1', 'COL2', 'COL3', 'COL4'], axis=1, inplace=True)
df.dropna(inplace=True)
i Revenue QTY Products category
0 0 2333677473 21 12 Gold Krug
2 2 3256666647 35 16 Gold Krug
3 3 3456788 42 18 Gold Krug
4 4 4588960 49 20 Gold Krug
5 5 346869909 56 22 Gold Krug
7 7 37687589 70 26 Silver Krug
将搜索分成两部分,然后连接:
category = (df.filter(like='COL')
.agg(','.join, axis = 1)
.str.extract(fr"({'|'.join(KRUG)})")
.dropna()
.set_axis(['category'], axis = 'columns')
)
others = df.loc[df.filter(like='COL').isin(KRUG).any(1),
['Revenue', 'QTY', 'Products']]
pd.concat([category, others], axis = 'columns')
category Revenue QTY Products
0 Gold Krug 2333677473 21 12
2 Gold Krug 3256666647 35 16
3 Gold Krug 3456788 42 18
4 Gold Krug 4588960 49 20
5 Gold Krug 346869909 56 22
7 Silver Krug 37687589 70 26
9 Gold Krug 468 84 30