使用具有多个条件、tres 日期和一个对象的 numpy/pandas 过滤 df
Filter df using numpy/pandas with multiple conditions, tres date and one object
我叫维克多,我有阿斯伯格综合症
我无法清楚地理解功能,无法在脑海中综合它们并将它们传达给计算机,但我可以在视觉情况下进行可视化和表达
我有一个数据框,其中填充了多年来的会员注册、取消和非隶属关系。
我需要知道在给定日期有哪些附属公司。
用户可能出于 2 个原因不再是会员,取消或取消会员资格,有时两者兼而有之。
我将向您展示 2 个不同的示例,说明我需要计算机如何处理数据帧
数据库示例:
import pandas as pd
df = pd.DataFrame({'political party': ['MDB', 'MDB', 'PODE', 'PDT', 'PSL', 'PV', 'PSL', 'PT', 'PL'],
'affiliated': ['Bob', 'John', 'Olivia', 'James', 'Victor', 'Victor', 'Emma', 'Rose', 'Mark'],
'date_affiliation': ['2006-01-31', '2011-04-11', '2007-09-04', '2009-10-13', '2017-12-30', '2020-09-02', '1992-02-23', '2010-10-19', '1985-06-22'],
'situation': ['unaffiliated', 'affiliated', 'canceled', 'canceled', 'canceled', 'affiliated', 'affiliated', 'unaffiliated', 'canceled'],
'date_disaffiliation': ['2020-02-18', '', '', '2011-11-23', '', '', '', '2010-10-30', '2010-04-08'],
'date_cancellation': ['', '', '2019-10-15', '2011-11-10', '2020-07-02', '', '', '', '2010-04-08']})
cols_date = ['date_affiliation', 'date_disaffiliation', 'date_cancellation']
for col in cols_date:
df[col] = pd.to_datetime(df[col], errors='coerce')
print(df)
political party
affiliated
date_affiliation
situation
date_disaffiliation
date_cancellation
0
MDB
Bob
2006-01-31
unaffiliated
2020-02-18
NaT
1
MDB
John
2011-04-11
affiliated
NaT
NaT
2
PODE
Olivia
2007-09-04
canceled
NaT
2019-10-15
3
PDT
James
2009-10-13
canceled
2011-11-23
2011-11-10
4
PSL
Victor
2017-12-30
canceled
NaT
2020-07-02
5
PV
Victor
2020-09-02
affiliated
NaT
NaT
6
PSL
Emma
1992-02-23
affiliated
NaT
NaT
7
PT
Rose
2010-10-19
unaffiliated
2010-10-30
NaT
8
PL
Mark
1985-06-22
canceled
2010-04-08
2010-04-08
出样一
political party
affiliated
date_affiliation
situation
date_disaffiliation
date_cancellation
affiliat_2005_08_15
affiliat_2010_08_07
affiliat_2020_01_05
affiliat_2020_11_15
0
MDB
Bob
2006-01-31
unaffiliated
2020-02-18
NaT
False
True
True
False
1
MDB
John
2011-04-11
affiliated
NaT
NaT
False
False
True
True
2
PODE
Olivia
2007-09-04
canceled
NaT
2019-10-15
False
True
False
False
3
PDT
James
2009-10-13
canceled
2011-11-23
2011-11-10
False
True
False
False
4
PSL
Victor
2017-12-30
canceled
NaT
2020-07-02
False
False
True
False
5
PV
Victor
2020-09-02
affiliated
NaT
NaT
False
False
False
True
6
PSL
Emma
1992-02-23
affiliated
NaT
NaT
True
True
True
True
7
PT
Rose
2010-10-19
unaffiliated
2010-10-30
NaT
False
False
False
False
8
PL
Mark
1985-06-22
canceled
2010-04-08
2010-04-08
True
False
False
False
输出样本二
2005_08_15
的附属机构
political party
affiliated
date_affiliation
situation
date_disaffiliation
date_cancellation
0
PSL
Emma
1992-02-23
affiliated
NaT
NaT
1
PL
Mark
1985-06-22
canceled
2010-04-08
2010-04-08
2010_08_07
的附属机构
political party
affiliated
date_affiliation
situation
date_disaffiliation
date_cancellation
0
MDB
Bob
2006-01-31
unaffiliated
2020-02-18
NaT
1
PODE
Olivia
2007-09-04
canceled
NaT
2019-10-15
2
PDT
James
2009-10-13
canceled
2011-11-23
2011-11-10
3
PSL
Emma
1992-02-23
affiliated
NaT
NaT
2020_01_05
的附属机构
political party
affiliated
date_affiliation
situation
date_disaffiliation
date_cancellation
0
MDB
Bob
2006-01-31
unaffiliated
2020-02-18
NaT
1
MDB
John
2011-04-11
affiliated
NaT
NaT
2
PSL
Victor
2017-12-30
canceled
NaT
2020-07-02
3
PSL
Emma
1992-02-23
affiliated
NaT
NaT
df_2020_11_15
的附属机构
political party
affiliated
date_affiliation
situation
date_disaffiliation
date_cancellation
0
MDB
John
2011-04-11
affiliated
NaT
NaT
1
PV
Victor
2020-09-02
affiliated
NaT
NaT
2
PSL
Emma
1992-02-23
affiliated
NaT
NaT
我想你必须使用
.where()
这样的方法:
df.where(df['date_affiliation'] <= '2005-08-15')
您可以使用:
[ '<' , '>' , '==' , '>=' ]
改为“<=”,找到你想要的数据。
这会帮助您找到正确的方向吗?
# date to test:
date = '2010-08-07'
# Caluclate some help columns:
affiliated_before_date = df.date_affiliation <= date
disaffiliation_before_date = df.date_disaffiliation <= date
cancellation_before_date = df.date_cancellation <= date
# Final logic. Must be affiliated, but then NOT disaffiliated or cancelled.
people_to_include = affiliated_before_date & ~( disaffiliation_before_date | cancellation_before_date)
df[people_to_include]
对于要求的第二个输出,我会做类似的事情:
dates_to_add = ['2005-08-15','2010-08-07','2020-01-05','2020-11-15']
def calculate_new_data_column(df, date):
affiliated_before_date = df.date_affiliation <= date
disaffiliation_before_date = df.date_disaffiliation <= date
cancellation_before_date = df.date_cancellation <= date
return affiliated_before_date & ~( disaffiliation_before_date | cancellation_before_date)
for date in dates_to_add:
df[f'affiliat-{date}'] = calculate_new_data_column(df, date)
我叫维克多,我有阿斯伯格综合症 我无法清楚地理解功能,无法在脑海中综合它们并将它们传达给计算机,但我可以在视觉情况下进行可视化和表达
我有一个数据框,其中填充了多年来的会员注册、取消和非隶属关系。
我需要知道在给定日期有哪些附属公司。
用户可能出于 2 个原因不再是会员,取消或取消会员资格,有时两者兼而有之。
我将向您展示 2 个不同的示例,说明我需要计算机如何处理数据帧
数据库示例:
import pandas as pd
df = pd.DataFrame({'political party': ['MDB', 'MDB', 'PODE', 'PDT', 'PSL', 'PV', 'PSL', 'PT', 'PL'],
'affiliated': ['Bob', 'John', 'Olivia', 'James', 'Victor', 'Victor', 'Emma', 'Rose', 'Mark'],
'date_affiliation': ['2006-01-31', '2011-04-11', '2007-09-04', '2009-10-13', '2017-12-30', '2020-09-02', '1992-02-23', '2010-10-19', '1985-06-22'],
'situation': ['unaffiliated', 'affiliated', 'canceled', 'canceled', 'canceled', 'affiliated', 'affiliated', 'unaffiliated', 'canceled'],
'date_disaffiliation': ['2020-02-18', '', '', '2011-11-23', '', '', '', '2010-10-30', '2010-04-08'],
'date_cancellation': ['', '', '2019-10-15', '2011-11-10', '2020-07-02', '', '', '', '2010-04-08']})
cols_date = ['date_affiliation', 'date_disaffiliation', 'date_cancellation']
for col in cols_date:
df[col] = pd.to_datetime(df[col], errors='coerce')
print(df)
political party | affiliated | date_affiliation | situation | date_disaffiliation | date_cancellation | |
---|---|---|---|---|---|---|
0 | MDB | Bob | 2006-01-31 | unaffiliated | 2020-02-18 | NaT |
1 | MDB | John | 2011-04-11 | affiliated | NaT | NaT |
2 | PODE | Olivia | 2007-09-04 | canceled | NaT | 2019-10-15 |
3 | PDT | James | 2009-10-13 | canceled | 2011-11-23 | 2011-11-10 |
4 | PSL | Victor | 2017-12-30 | canceled | NaT | 2020-07-02 |
5 | PV | Victor | 2020-09-02 | affiliated | NaT | NaT |
6 | PSL | Emma | 1992-02-23 | affiliated | NaT | NaT |
7 | PT | Rose | 2010-10-19 | unaffiliated | 2010-10-30 | NaT |
8 | PL | Mark | 1985-06-22 | canceled | 2010-04-08 | 2010-04-08 |
出样一
political party | affiliated | date_affiliation | situation | date_disaffiliation | date_cancellation | affiliat_2005_08_15 | affiliat_2010_08_07 | affiliat_2020_01_05 | affiliat_2020_11_15 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | MDB | Bob | 2006-01-31 | unaffiliated | 2020-02-18 | NaT | False | True | True | False |
1 | MDB | John | 2011-04-11 | affiliated | NaT | NaT | False | False | True | True |
2 | PODE | Olivia | 2007-09-04 | canceled | NaT | 2019-10-15 | False | True | False | False |
3 | PDT | James | 2009-10-13 | canceled | 2011-11-23 | 2011-11-10 | False | True | False | False |
4 | PSL | Victor | 2017-12-30 | canceled | NaT | 2020-07-02 | False | False | True | False |
5 | PV | Victor | 2020-09-02 | affiliated | NaT | NaT | False | False | False | True |
6 | PSL | Emma | 1992-02-23 | affiliated | NaT | NaT | True | True | True | True |
7 | PT | Rose | 2010-10-19 | unaffiliated | 2010-10-30 | NaT | False | False | False | False |
8 | PL | Mark | 1985-06-22 | canceled | 2010-04-08 | 2010-04-08 | True | False | False | False |
输出样本二
2005_08_15
的附属机构political party | affiliated | date_affiliation | situation | date_disaffiliation | date_cancellation | |
---|---|---|---|---|---|---|
0 | PSL | Emma | 1992-02-23 | affiliated | NaT | NaT |
1 | PL | Mark | 1985-06-22 | canceled | 2010-04-08 | 2010-04-08 |
2010_08_07
的附属机构political party | affiliated | date_affiliation | situation | date_disaffiliation | date_cancellation | |
---|---|---|---|---|---|---|
0 | MDB | Bob | 2006-01-31 | unaffiliated | 2020-02-18 | NaT |
1 | PODE | Olivia | 2007-09-04 | canceled | NaT | 2019-10-15 |
2 | PDT | James | 2009-10-13 | canceled | 2011-11-23 | 2011-11-10 |
3 | PSL | Emma | 1992-02-23 | affiliated | NaT | NaT |
2020_01_05
的附属机构political party | affiliated | date_affiliation | situation | date_disaffiliation | date_cancellation | |
---|---|---|---|---|---|---|
0 | MDB | Bob | 2006-01-31 | unaffiliated | 2020-02-18 | NaT |
1 | MDB | John | 2011-04-11 | affiliated | NaT | NaT |
2 | PSL | Victor | 2017-12-30 | canceled | NaT | 2020-07-02 |
3 | PSL | Emma | 1992-02-23 | affiliated | NaT | NaT |
df_2020_11_15
的附属机构political party | affiliated | date_affiliation | situation | date_disaffiliation | date_cancellation | |
---|---|---|---|---|---|---|
0 | MDB | John | 2011-04-11 | affiliated | NaT | NaT |
1 | PV | Victor | 2020-09-02 | affiliated | NaT | NaT |
2 | PSL | Emma | 1992-02-23 | affiliated | NaT | NaT |
我想你必须使用
.where()
这样的方法:
df.where(df['date_affiliation'] <= '2005-08-15')
您可以使用:
[ '<' , '>' , '==' , '>=' ]
改为“<=”,找到你想要的数据。
这会帮助您找到正确的方向吗?
# date to test:
date = '2010-08-07'
# Caluclate some help columns:
affiliated_before_date = df.date_affiliation <= date
disaffiliation_before_date = df.date_disaffiliation <= date
cancellation_before_date = df.date_cancellation <= date
# Final logic. Must be affiliated, but then NOT disaffiliated or cancelled.
people_to_include = affiliated_before_date & ~( disaffiliation_before_date | cancellation_before_date)
df[people_to_include]
对于要求的第二个输出,我会做类似的事情:
dates_to_add = ['2005-08-15','2010-08-07','2020-01-05','2020-11-15']
def calculate_new_data_column(df, date):
affiliated_before_date = df.date_affiliation <= date
disaffiliation_before_date = df.date_disaffiliation <= date
cancellation_before_date = df.date_cancellation <= date
return affiliated_before_date & ~( disaffiliation_before_date | cancellation_before_date)
for date in dates_to_add:
df[f'affiliat-{date}'] = calculate_new_data_column(df, date)