检查列 'long_date' 中的哪些日期也在数组 holy_date 中
check which dates from column 'long_date' are also in array holy_date
我得到一个空的 df,虽然我知道其中应该有一些行
有什么解决办法吗?
第 7 行 运行 之后的 df 如下所示:
long_date
country
2020-11-07
Portugal
2020-01-01
Portugal
holy_date 看起来像这样:['2020-01-01','2020-01-06']
from numpy.ma.extras import isin
import holidays
df = df[(df['country'] == 'Portugal')]
min_year = (pd.DatetimeIndex(df.long_date).year.min())
max_year = (pd.DatetimeIndex(df.long_date).year.max())+1
holy_date = [i.strftime('%Y-%m-%d') for i in [*holidays.CountryHoliday('Portugal',years = np.arange(min_year,max_year,1)).keys()]]
df.long_date= pd.to_datetime(df.long_date).dt.date
df = pd.concat([df,df.long_date.isin(holy_date).rename('bh')],axis =1)
df[df['bh']==True]
问题出在您试图识别日期时间列中的字符串这一事实。你应该做的是消除行
df.long_date= pd.to_datetime(df.long_date).dt.date
并改用它:我在您的数据中添加了一些日期
long_date country
0 2020-11-07 Portugal
1 2020-11-01 Portugal
2 2020-10-01 Portugal
3 2020-06-11 Portugal
和
from numpy.ma.extras import isin
import holidays
import pandas as pd
df = pd.read_csv('holyday.csv', sep=";")
print(df)
df = df[(df['country'] == 'Portugal')]
min_year = (pd.DatetimeIndex(df.long_date).year.min())
max_year = (pd.DatetimeIndex(df.long_date).year.max())+1
holy_date = [i.strftime('%Y-%m-%d') for i in [*holidays.CountryHoliday('Portugal',years = np.arange(min_year,max_year,1)).keys()]]
holy_date = list(holy_date)
#df.long_date= pd.to_datetime(df.long_date).dt.date
df = pd.concat([df,df['long_date'].isin(holy_date).rename('bh')],axis =1)
print(df)
df[df['bh']==True]
产生这个:
long_date country
0 2020-11-07 Portugal
1 2020-11-01 Portugal
2 2020-10-01 Portugal
3 2020-06-11 Portugal
long_date country bh
0 2020-11-07 Portugal False
1 2020-11-01 Portugal True
2 2020-10-01 Portugal False
3 2020-06-11 Portugal True
long_date country bh
1 2020-11-01 Portugal True
3 2020-06-11 Portugal True
我得到一个空的 df,虽然我知道其中应该有一些行
有什么解决办法吗?
第 7 行 运行 之后的 df 如下所示:
long_date | country |
---|---|
2020-11-07 | Portugal |
2020-01-01 | Portugal |
holy_date 看起来像这样:['2020-01-01','2020-01-06']
from numpy.ma.extras import isin
import holidays
df = df[(df['country'] == 'Portugal')]
min_year = (pd.DatetimeIndex(df.long_date).year.min())
max_year = (pd.DatetimeIndex(df.long_date).year.max())+1
holy_date = [i.strftime('%Y-%m-%d') for i in [*holidays.CountryHoliday('Portugal',years = np.arange(min_year,max_year,1)).keys()]]
df.long_date= pd.to_datetime(df.long_date).dt.date
df = pd.concat([df,df.long_date.isin(holy_date).rename('bh')],axis =1)
df[df['bh']==True]
问题出在您试图识别日期时间列中的字符串这一事实。你应该做的是消除行
df.long_date= pd.to_datetime(df.long_date).dt.date
并改用它:我在您的数据中添加了一些日期
long_date country
0 2020-11-07 Portugal
1 2020-11-01 Portugal
2 2020-10-01 Portugal
3 2020-06-11 Portugal
和
from numpy.ma.extras import isin
import holidays
import pandas as pd
df = pd.read_csv('holyday.csv', sep=";")
print(df)
df = df[(df['country'] == 'Portugal')]
min_year = (pd.DatetimeIndex(df.long_date).year.min())
max_year = (pd.DatetimeIndex(df.long_date).year.max())+1
holy_date = [i.strftime('%Y-%m-%d') for i in [*holidays.CountryHoliday('Portugal',years = np.arange(min_year,max_year,1)).keys()]]
holy_date = list(holy_date)
#df.long_date= pd.to_datetime(df.long_date).dt.date
df = pd.concat([df,df['long_date'].isin(holy_date).rename('bh')],axis =1)
print(df)
df[df['bh']==True]
产生这个:
long_date country
0 2020-11-07 Portugal
1 2020-11-01 Portugal
2 2020-10-01 Portugal
3 2020-06-11 Portugal
long_date country bh
0 2020-11-07 Portugal False
1 2020-11-01 Portugal True
2 2020-10-01 Portugal False
3 2020-06-11 Portugal True
long_date country bh
1 2020-11-01 Portugal True
3 2020-06-11 Portugal True