Pandas 将带有列表对象的列与包含 int 的另一列进行比较
Pandas to compare a column with list object with another column containing int
我有下面的 panads 数据框,我想在其中比较一列的列表对象(列表中的名称)与另一列中的整数值。
数据框构造:
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0722882 | Shivam Verma | RD-DI-Infra-Linux | Karn Kumar | Active | IN-NDA02 | 2 |
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar | Active | IN-NDA02 | 5 |
| INC0790029 | Akhil Garg | RD-DI-Infra-Storage | Amit Raj | Awaiting User Info | IN-NDA02 | 3 |
| INC0743690 | Japesh Kumar | RD-DI-Infra-Linux | Shakir Chaudhry | Awaiting User Info | IN-NDA02 | 5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
Pandas代码:
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
from tabulate import tabulate
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##########################################################################################
def pprint_df(dframe):
print(tabulate(dframe, headers='keys', tablefmt='psql', showindex=False))
names = ['Amit Raj','Andre Geurts','Andrzej Kamionek','Ankur Wason','Ashish Kumar','Carl Thijssen','Chris Masson','Daniel Chorazy','Devarishi Kumar','Elizabeth Tamayo','Eric Oomen','Gopinath Perumal','Jakub Kubera','Jeffrey Thompson','Jeroen Kwanten','Karn Kumar','Kenny Henderson','Manish Kumar','Mihai Pârlea','Mihai Reus','Naveen Kumar','Rafiq Khan','Rob Goossens','Robert in','Roger Smith','Santhoshkumar Krishnamoorthy','Shakir Chaudhry','Sonu Kumar','Suraj Budha','Szymon Kolodziejski','Szymon Kubera','Tony Olsson','Vetrivelan Rajagopalan','Yogesh Miglani','Abrar Ahmad']
col_name = ['Number','Caller','Assignment group','Assigned to','Status(state)','Location','Aging']
df = pd.read_excel('Backlog-April_24.xlsx', usecols=col_name, encoding='utf-8', index=False)
# df = df[df['Assigned to'].isin(names)] <-- This works perfectly with above dataframe
df = df[df['Assigned to'].isin(names) & df['Aging'] >= 5]
print(df.dtypes)
pprint_df(df)
当我 运行 上面的代码时,即使我将 int 转换为 str
我也没有得到结果。
$ ./pd_code.py
Number object
Caller object
Assignment group object
Assigned to object
Status(state) object
Location object
Aging object
dtype: object
+----------+----------+--------------------+---------------+-----------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|----------+----------+--------------------+---------------+-----------------+------------+---------|
+----------+----------+--------------------+---------------+-----------------+------------+---------+
期望的输出:
示例:
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar | Active | IN-NDA02 | 5 |
| INC0743690 | Japesh Kumar | RD-DI-Infra-Linux | Shakir Chaudhry | Awaiting User Info | IN-NDA02 | 5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
为了后代,我们需要使用布尔索引...
布尔索引:
另一个常见的操作是使用布尔向量来过滤数据。运算符是:|
表示或,&
表示与,~
表示非。这些必须使用括号分组。
df = df[df['Assigned to'].isin(names) & (df['Aging'] >= 5)]
或
df = df[(df['Assigned to'].isin(names)) & (df['Aging'] >= 5)]
还有一篇关于operator's precedence的非常详细的文章,值得一读。
我有下面的 panads 数据框,我想在其中比较一列的列表对象(列表中的名称)与另一列中的整数值。
数据框构造:
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0722882 | Shivam Verma | RD-DI-Infra-Linux | Karn Kumar | Active | IN-NDA02 | 2 |
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar | Active | IN-NDA02 | 5 |
| INC0790029 | Akhil Garg | RD-DI-Infra-Storage | Amit Raj | Awaiting User Info | IN-NDA02 | 3 |
| INC0743690 | Japesh Kumar | RD-DI-Infra-Linux | Shakir Chaudhry | Awaiting User Info | IN-NDA02 | 5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
Pandas代码:
from __future__ import print_function
from signal import signal, SIGPIPE, SIG_DFL
signal(SIGPIPE,SIG_DFL)
from tabulate import tabulate
import pandas as pd
##### Python pandas, widen output display to see more columns. ####
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('expand_frame_repr', True)
##########################################################################################
def pprint_df(dframe):
print(tabulate(dframe, headers='keys', tablefmt='psql', showindex=False))
names = ['Amit Raj','Andre Geurts','Andrzej Kamionek','Ankur Wason','Ashish Kumar','Carl Thijssen','Chris Masson','Daniel Chorazy','Devarishi Kumar','Elizabeth Tamayo','Eric Oomen','Gopinath Perumal','Jakub Kubera','Jeffrey Thompson','Jeroen Kwanten','Karn Kumar','Kenny Henderson','Manish Kumar','Mihai Pârlea','Mihai Reus','Naveen Kumar','Rafiq Khan','Rob Goossens','Robert in','Roger Smith','Santhoshkumar Krishnamoorthy','Shakir Chaudhry','Sonu Kumar','Suraj Budha','Szymon Kolodziejski','Szymon Kubera','Tony Olsson','Vetrivelan Rajagopalan','Yogesh Miglani','Abrar Ahmad']
col_name = ['Number','Caller','Assignment group','Assigned to','Status(state)','Location','Aging']
df = pd.read_excel('Backlog-April_24.xlsx', usecols=col_name, encoding='utf-8', index=False)
# df = df[df['Assigned to'].isin(names)] <-- This works perfectly with above dataframe
df = df[df['Assigned to'].isin(names) & df['Aging'] >= 5]
print(df.dtypes)
pprint_df(df)
当我 运行 上面的代码时,即使我将 int 转换为 str
我也没有得到结果。
$ ./pd_code.py
Number object
Caller object
Assignment group object
Assigned to object
Status(state) object
Location object
Aging object
dtype: object
+----------+----------+--------------------+---------------+-----------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|----------+----------+--------------------+---------------+-----------------+------------+---------|
+----------+----------+--------------------+---------------+-----------------+------------+---------+
期望的输出:
示例:
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
| Number | Caller | Assignment group | Assigned to | Status(state) | Location | Aging |
|------------+-----------------------+----------------------+-----------------+--------------------+------------+---------|
| INC0786494 | Kanhaiya Kumar Mishra | RD-Hotspot-Team-APAC | Karn Kumar | Active | IN-NDA02 | 5 |
| INC0743690 | Japesh Kumar | RD-DI-Infra-Linux | Shakir Chaudhry | Awaiting User Info | IN-NDA02 | 5 |
+------------+-----------------------+----------------------+-----------------+--------------------+------------+---------+
为了后代,我们需要使用布尔索引...
布尔索引:
另一个常见的操作是使用布尔向量来过滤数据。运算符是:|
表示或,&
表示与,~
表示非。这些必须使用括号分组。
df = df[df['Assigned to'].isin(names) & (df['Aging'] >= 5)]
或
df = df[(df['Assigned to'].isin(names)) & (df['Aging'] >= 5)]
还有一篇关于operator's precedence的非常详细的文章,值得一读。