筛选不包括周末的日期范围
Filter dates range excluding weekends
我有以下数据集和代码,它们可以正常工作,但输出中包含周末。我想要的是排除周末。
record_id,date,site,sick,funny,happy
CDEC1947-6,9/2/2018,2,1,1,1
IJKC1953-4,9/29/2018,2,1,1,1
FGHC1724-9,10/25/2018,2,3,1,1
FGHC2929-1,10/31/2018,4,1,1,1
CDEC1912-0,11/1/2018,1,1,1,1
IJKC1726-4,11/2/2018,1,3,1,1
IJKC1728-0,10/26/2018,2,3,1,1
ABCC1730-6,11/2/2018,2,3,1,1
ABCC1731-4,11/2/2018,2,3,1,1
CDEC1733-0,10/22/2018,1,3,1,1
CDEC1735-5,11/2/2018,2,3,1,1
IJKC1914-6,10/27/2018,2,6,1,1
ABCC1916-1,10/23/2018,2,6,1,1
IJKC1918-7,11/2/2018,2,1,1,1
CDEC1920-3,10/24/2018,1,6,1,1
IJKC1943-5,11/2/2018,2,4,1,1
ABCC1945-0,11/2/2018,1,4,1,1
ABCC1949-2,10/25/2018,2,4,1,1
CDEC1951-8,11/2/2018,2,5,1,1
CDEC2924-2,11/3/2018,4,1,1,1
CDEC2927-5,11/3/2018,1,1,1,1
ABCC2925-9,11/4/2018,4,1,1,1
IJKC1941-9,11/4/2018,2,4,1,1
ABCC2922-6,11/5/2018,1,1,1,1
代码:
import pandas as pd
import numpy as np
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import *
import plotly.graph_objs as go
import datetime as dt
#import datetime
#from datetime import date
#from datetime import timedelta
today = date.today()
from IPython.core.interactiveshell import InteractiveShell
%matplotlib inline
df=pd.read_csv("dataset.csv", encoding="utf-8",low_memory=False)
df["date"]=pd.to_datetime(df["date"])
df["site"]=df["site"].astype("category") # Convert to category
df['sick']=df['sick'].astype('category')
df["funny"]=df["funny"].astype("category")
df["happy"]=df["happy"].astype("category")
df = df.sort_values(by='date', ascending='True')
df.head()
df
record_id date site sick funny happy
0 CDEC1947-6 2018-09-02 2 1 1 1
1 IJKC1953-4 2018-09-29 2 1 1 1
9 CDEC1733-0 2018-10-22 1 3 1 1
12 ABCC1916-1 2018-10-23 2 6 1 1
14 CDEC1920-3 2018-10-24 1 6 1 1
2 FGHC1724-9 2018-10-25 2 3 1 1
17 ABCC1949-2 2018-10-25 2 4 1 1
6 IJKC1728-0 2018-10-26 2 3 1 1
11 IJKC1914-6 2018-10-27 2 6 1 1
3 FGHC2929-1 2018-10-31 4 1 1 1
4 CDEC1912-0 2018-11-01 1 1 1 1
7 ABCC1730-6 2018-11-02 2 3 1 1
10 CDEC1735-5 2018-11-02 2 3 1 1
5 IJKC1726-4 2018-11-02 1 3 1 1
13 IJKC1918-7 2018-11-02 2 1 1 1
15 IJKC1943-5 2018-11-02 2 4 1 1
16 ABCC1945-0 2018-11-02 1 4 1 1
18 CDEC1951-8 2018-11-02 2 5 1 1
8 ABCC1731-4 2018-11-02 2 3 1 1
19 CDEC2924-2 2018-11-03 4 1 1 1
20 CDEC2927-5 2018-11-03 1 1 1 1
22 IJKC1941-9 2018-11-04 2 4 1 1
21 ABCC2925-9 2018-11-04 4 1 1 1
23 ABCC2922-6 2018-11-05 1 1 1 1
# get first and last datetime for final week of data
range_max = df['date'].max()
range_min = range_max - dt.timedelta(days=7)
# take slice with final week of data
sliced_df = df[(df['date'] >= range_min) &
(df['date'] <= range_max)]
sliced_df
record_id date site sick funny happy
3 FGHC2929-1 2018-10-31 4 1 1 1
4 CDEC1912-0 2018-11-01 1 1 1 1
7 ABCC1730-6 2018-11-02 2 3 1 1
10 CDEC1735-5 2018-11-02 2 3 1 1
5 IJKC1726-4 2018-11-02 1 3 1 1
13 IJKC1918-7 2018-11-02 2 1 1 1
15 IJKC1943-5 2018-11-02 2 4 1 1
16 ABCC1945-0 2018-11-02 1 4 1 1
18 CDEC1951-8 2018-11-02 2 5 1 1
8 ABCC1731-4 2018-11-02 2 3 1 1
19 CDEC2924-2 2018-11-03 4 1 1 1
20 CDEC2927-5 2018-11-03 1 1 1 1
22 IJKC1941-9 2018-11-04 2 4 1 1
21 ABCC2925-9 2018-11-04 4 1 1 1
23 ABCC2922-6 2018-11-05 1 1 1 1
如何去掉周末? (2018-11-04 和 2018-11-03)在输出中?
我正在考虑使用 timedelta weekdays<=4 但我不知道如何在这里申请。非常欢迎您的帮助。
你已经在正确的轨道上了。您可以使用 .dt accessor 检索带有日期时间的数据框列的工作日。然后可以使用它来过滤您的数据框行:
filtered_df = sliced_df[sliced_df['date'].dt.weekday < 5]
我有以下数据集和代码,它们可以正常工作,但输出中包含周末。我想要的是排除周末。
record_id,date,site,sick,funny,happy
CDEC1947-6,9/2/2018,2,1,1,1
IJKC1953-4,9/29/2018,2,1,1,1
FGHC1724-9,10/25/2018,2,3,1,1
FGHC2929-1,10/31/2018,4,1,1,1
CDEC1912-0,11/1/2018,1,1,1,1
IJKC1726-4,11/2/2018,1,3,1,1
IJKC1728-0,10/26/2018,2,3,1,1
ABCC1730-6,11/2/2018,2,3,1,1
ABCC1731-4,11/2/2018,2,3,1,1
CDEC1733-0,10/22/2018,1,3,1,1
CDEC1735-5,11/2/2018,2,3,1,1
IJKC1914-6,10/27/2018,2,6,1,1
ABCC1916-1,10/23/2018,2,6,1,1
IJKC1918-7,11/2/2018,2,1,1,1
CDEC1920-3,10/24/2018,1,6,1,1
IJKC1943-5,11/2/2018,2,4,1,1
ABCC1945-0,11/2/2018,1,4,1,1
ABCC1949-2,10/25/2018,2,4,1,1
CDEC1951-8,11/2/2018,2,5,1,1
CDEC2924-2,11/3/2018,4,1,1,1
CDEC2927-5,11/3/2018,1,1,1,1
ABCC2925-9,11/4/2018,4,1,1,1
IJKC1941-9,11/4/2018,2,4,1,1
ABCC2922-6,11/5/2018,1,1,1,1
代码:
import pandas as pd
import numpy as np
from plotly.offline import init_notebook_mode, iplot
from plotly.graph_objs import *
import plotly.graph_objs as go
import datetime as dt
#import datetime
#from datetime import date
#from datetime import timedelta
today = date.today()
from IPython.core.interactiveshell import InteractiveShell
%matplotlib inline
df=pd.read_csv("dataset.csv", encoding="utf-8",low_memory=False)
df["date"]=pd.to_datetime(df["date"])
df["site"]=df["site"].astype("category") # Convert to category
df['sick']=df['sick'].astype('category')
df["funny"]=df["funny"].astype("category")
df["happy"]=df["happy"].astype("category")
df = df.sort_values(by='date', ascending='True')
df.head()
df
record_id date site sick funny happy
0 CDEC1947-6 2018-09-02 2 1 1 1
1 IJKC1953-4 2018-09-29 2 1 1 1
9 CDEC1733-0 2018-10-22 1 3 1 1
12 ABCC1916-1 2018-10-23 2 6 1 1
14 CDEC1920-3 2018-10-24 1 6 1 1
2 FGHC1724-9 2018-10-25 2 3 1 1
17 ABCC1949-2 2018-10-25 2 4 1 1
6 IJKC1728-0 2018-10-26 2 3 1 1
11 IJKC1914-6 2018-10-27 2 6 1 1
3 FGHC2929-1 2018-10-31 4 1 1 1
4 CDEC1912-0 2018-11-01 1 1 1 1
7 ABCC1730-6 2018-11-02 2 3 1 1
10 CDEC1735-5 2018-11-02 2 3 1 1
5 IJKC1726-4 2018-11-02 1 3 1 1
13 IJKC1918-7 2018-11-02 2 1 1 1
15 IJKC1943-5 2018-11-02 2 4 1 1
16 ABCC1945-0 2018-11-02 1 4 1 1
18 CDEC1951-8 2018-11-02 2 5 1 1
8 ABCC1731-4 2018-11-02 2 3 1 1
19 CDEC2924-2 2018-11-03 4 1 1 1
20 CDEC2927-5 2018-11-03 1 1 1 1
22 IJKC1941-9 2018-11-04 2 4 1 1
21 ABCC2925-9 2018-11-04 4 1 1 1
23 ABCC2922-6 2018-11-05 1 1 1 1
# get first and last datetime for final week of data
range_max = df['date'].max()
range_min = range_max - dt.timedelta(days=7)
# take slice with final week of data
sliced_df = df[(df['date'] >= range_min) &
(df['date'] <= range_max)]
sliced_df
record_id date site sick funny happy
3 FGHC2929-1 2018-10-31 4 1 1 1
4 CDEC1912-0 2018-11-01 1 1 1 1
7 ABCC1730-6 2018-11-02 2 3 1 1
10 CDEC1735-5 2018-11-02 2 3 1 1
5 IJKC1726-4 2018-11-02 1 3 1 1
13 IJKC1918-7 2018-11-02 2 1 1 1
15 IJKC1943-5 2018-11-02 2 4 1 1
16 ABCC1945-0 2018-11-02 1 4 1 1
18 CDEC1951-8 2018-11-02 2 5 1 1
8 ABCC1731-4 2018-11-02 2 3 1 1
19 CDEC2924-2 2018-11-03 4 1 1 1
20 CDEC2927-5 2018-11-03 1 1 1 1
22 IJKC1941-9 2018-11-04 2 4 1 1
21 ABCC2925-9 2018-11-04 4 1 1 1
23 ABCC2922-6 2018-11-05 1 1 1 1
如何去掉周末? (2018-11-04 和 2018-11-03)在输出中? 我正在考虑使用 timedelta weekdays<=4 但我不知道如何在这里申请。非常欢迎您的帮助。
你已经在正确的轨道上了。您可以使用 .dt accessor 检索带有日期时间的数据框列的工作日。然后可以使用它来过滤您的数据框行:
filtered_df = sliced_df[sliced_df['date'].dt.weekday < 5]