Pandas select 匹配多列
Pandas select match multiple columns
我有这样的数据:
category = ['Car','Car','Car','Car','Truck','Truck','Truck']
name = ['Camry','Camry','Camry','Camry','Tacoma','Tundra','Tundra']
year = ['2007','2007','2008','2009','2010','2010','2011']
vals = [0.1,0.5,0.2,0.9,0.8,0.4,0.9]
df = pd.DataFrame({'Category': category,
'Name': name,
'Year': year,
'Vals': vals})
index
Category
Name
Year
Vals
0
Car
Camry
2007
0.1
1
Car
Camry
2007
0.5
2
Car
Camry
2008
0.2
3
Car
Camry
2009
0.9
4
Truck
Tacoma
2010
0.8
5
Truck
Tundra
2010
0.4
6
Truck
Tundra
2011
0.9
然后我有一组(类别、名称、年份)的组合,我想为其过滤数据框。它们可以是任何格式,但在这里它们在数据框中。
combinations_i_want = pd.DataFrame()
# (Car, Camry, 2007)
combinations_i_want = combinations_i_want.append({'Category':'Car', 'Name':'Camry','Year':'2007'},ignore_index=True) # 2 matches in df
# (Truck, Tundra, 2010)
combinations_i_want = combinations_i_want.append({'Category':'Truck', 'Name':'Tundra','Year':'2010'},ignore_index=True) # 1 match in df
我想提取 df 中与这两个组合完全匹配的行。这些将是第 0、1 和 5 行。结果 table 将如下所示:
index
Category
Name
Year
Vals
0
Car
Camry
2007
0.1
1
Car
Camry
2007
0.5
5
Truck
Tundra
2010
0.4
注意:我不需要旧索引,它们只是为了帮助可视化。
我该怎么做?
您应该使用 .loc
和 .isin
而不是 .append
你的句子可能是这样的:
df.loc[(df['Category'].isin(['Car', 'Truck'])) & (df['Name'].isin(['Camry', 'Tundra'])) & (df['Year'].isin(['2007', '2010']))]
这应该会产生您期望的结果。
如果需要,您可以将其分配给变量,例如
combinations_i_want = df.loc[(df['Category'].isin(['Car', 'Truck'])) &
(df['Name'].isin(['Camry', 'Tundra'])) &
(df['Year'].isin(['2007', '2010']))]
print(combinations_i_want)
您可以简单地右键加入您想要的列。
result = df.merge(combinations_i_want, how='right', on=['Category', 'Name', 'Year'])
使用数据框查询,它将根据布尔逻辑为您提供完美匹配
print(df.query("(Category=='Car' and Name=='Camry' and Year=='2007') or (Category=='Truck' and Name=='Tundra' and Year=='2010')"))
输出:
Category Name Year Vals
0 Car Camry 2007 0.1
1 Car Camry 2007 0.5
5 Truck Tundra 2010 0.4
我有这样的数据:
category = ['Car','Car','Car','Car','Truck','Truck','Truck']
name = ['Camry','Camry','Camry','Camry','Tacoma','Tundra','Tundra']
year = ['2007','2007','2008','2009','2010','2010','2011']
vals = [0.1,0.5,0.2,0.9,0.8,0.4,0.9]
df = pd.DataFrame({'Category': category,
'Name': name,
'Year': year,
'Vals': vals})
index | Category | Name | Year | Vals |
---|---|---|---|---|
0 | Car | Camry | 2007 | 0.1 |
1 | Car | Camry | 2007 | 0.5 |
2 | Car | Camry | 2008 | 0.2 |
3 | Car | Camry | 2009 | 0.9 |
4 | Truck | Tacoma | 2010 | 0.8 |
5 | Truck | Tundra | 2010 | 0.4 |
6 | Truck | Tundra | 2011 | 0.9 |
然后我有一组(类别、名称、年份)的组合,我想为其过滤数据框。它们可以是任何格式,但在这里它们在数据框中。
combinations_i_want = pd.DataFrame()
# (Car, Camry, 2007)
combinations_i_want = combinations_i_want.append({'Category':'Car', 'Name':'Camry','Year':'2007'},ignore_index=True) # 2 matches in df
# (Truck, Tundra, 2010)
combinations_i_want = combinations_i_want.append({'Category':'Truck', 'Name':'Tundra','Year':'2010'},ignore_index=True) # 1 match in df
我想提取 df 中与这两个组合完全匹配的行。这些将是第 0、1 和 5 行。结果 table 将如下所示:
index | Category | Name | Year | Vals |
---|---|---|---|---|
0 | Car | Camry | 2007 | 0.1 |
1 | Car | Camry | 2007 | 0.5 |
5 | Truck | Tundra | 2010 | 0.4 |
注意:我不需要旧索引,它们只是为了帮助可视化。
我该怎么做?
您应该使用 .loc
和 .isin
而不是 .append
你的句子可能是这样的:
df.loc[(df['Category'].isin(['Car', 'Truck'])) & (df['Name'].isin(['Camry', 'Tundra'])) & (df['Year'].isin(['2007', '2010']))]
这应该会产生您期望的结果。
如果需要,您可以将其分配给变量,例如
combinations_i_want = df.loc[(df['Category'].isin(['Car', 'Truck'])) &
(df['Name'].isin(['Camry', 'Tundra'])) &
(df['Year'].isin(['2007', '2010']))]
print(combinations_i_want)
您可以简单地右键加入您想要的列。
result = df.merge(combinations_i_want, how='right', on=['Category', 'Name', 'Year'])
使用数据框查询,它将根据布尔逻辑为您提供完美匹配
print(df.query("(Category=='Car' and Name=='Camry' and Year=='2007') or (Category=='Truck' and Name=='Tundra' and Year=='2010')"))
输出:
Category Name Year Vals
0 Car Camry 2007 0.1
1 Car Camry 2007 0.5
5 Truck Tundra 2010 0.4