如何使用 python 中的正则表达式检测数据框中的日期时间字段
how to detect datetime field in dataframe using regular expression in python
我正在尝试 return 日期时间类型的数据框中的字段,而不是用“日期”替换字段名称,以便将日期时间拆分为年和月。
当我 运行 代码崩溃并显示以下错误时:
df = df.rename(columns={converteddate[0]: 'date'})
UnboundLocalError: local variable 'converteddate' referenced before assignment
代码:
import pandas as pd
df = pd.DataFrame({'event_type': ['watch movie ', 'stay at home', 'swimming','camping','meeting'],
'date': ['8/11/2020', '2/13/2020', '7/04/2020','1/22/2020','7/28/2020'],
'event_mohafaza':['loc1','loc3','loc2','loc5','loc4'],
' number_person ':[24,39,20,10,33],})
non_numeric_cols = [col for col, col_type in df.dtypes.iteritems() if col_type == 'object']
if len(non_numeric_cols) > 0:
mask = df.astype(str).apply(lambda x : x.str.match('[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$').any())
if mask.any() == True:
df.loc[:,mask] = df.loc[:,mask].apply(pd.to_datetime,dayfirst=False)
converteddate = [col for col in df.columns if df[col].dtype == 'datetime64[ns]']
df = df.rename(columns={converteddate[0]: 'date'})
if "date" in df.columns:
df['year_month'] = df['date'].map(lambda x: x.strftime('%Y/%m'))
如果您修改正则表达式,则匹配有效:
import pandas as pd
df = pd.DataFrame({'event_type': ['watch movie ', 'stay at home', 'swimming','camping','meeting'],
'date': ['8/11/2020', '2/13/2020', '7/04/2020','1/22/2020','7/28/2020'],
'event_mohafaza':['loc1','loc3','loc2','loc5','loc4'],
' number_person ':[24,39,20,10,33],})
print(df)
non_numeric_cols = [col for col, col_type in df.dtypes.iteritems() if col_type == 'object']
if len(non_numeric_cols) > 0:
#mask = df.astype(str).apply(lambda x : x.str.match('[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$').any())
mask = df.astype(str).apply(lambda x : x.str.match('^([1-9]|1[0-9]|2[0-9]|3[0-1])(.|-|/)([1-9]|1[0-2])(.|-|/)20[0-9][0-9]$').any())
if mask.any() == True:
df.loc[:,mask] = df.loc[:,mask].apply(pd.to_datetime,dayfirst=False)
converteddate = [col for col in df.columns if df[col].dtype == 'datetime64[ns]']
df = df.rename(columns={converteddate[0]: 'date'})
if "date" in df.columns:
df['year_month'] = df['date'].map(lambda x: x.strftime('%Y/%m'))
print(df)
产生
event_type date event_mohafaza number_person
0 watch movie 8/11/2020 loc1 24
1 stay at home 2/13/2020 loc3 39
2 swimming 7/04/2020 loc2 20
3 camping 1/22/2020 loc5 10
4 meeting 7/28/2020 loc4 33
event_type date event_mohafaza number_person year_month
0 watch movie 2020-08-11 loc1 24 2020/08
1 stay at home 2020-02-13 loc3 39 2020/02
2 swimming 2020-07-04 loc2 20 2020/07
3 camping 2020-01-22 loc5 10 2020/01
4 meeting 2020-07-28 loc4 33 2020/07
您不是要替换 'date' 列,而是要添加一个 'year_month' 列(我保留原样)。
我正在尝试 return 日期时间类型的数据框中的字段,而不是用“日期”替换字段名称,以便将日期时间拆分为年和月。
当我 运行 代码崩溃并显示以下错误时:
df = df.rename(columns={converteddate[0]: 'date'})
UnboundLocalError: local variable 'converteddate' referenced before assignment
代码:
import pandas as pd
df = pd.DataFrame({'event_type': ['watch movie ', 'stay at home', 'swimming','camping','meeting'],
'date': ['8/11/2020', '2/13/2020', '7/04/2020','1/22/2020','7/28/2020'],
'event_mohafaza':['loc1','loc3','loc2','loc5','loc4'],
' number_person ':[24,39,20,10,33],})
non_numeric_cols = [col for col, col_type in df.dtypes.iteritems() if col_type == 'object']
if len(non_numeric_cols) > 0:
mask = df.astype(str).apply(lambda x : x.str.match('[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$').any())
if mask.any() == True:
df.loc[:,mask] = df.loc[:,mask].apply(pd.to_datetime,dayfirst=False)
converteddate = [col for col in df.columns if df[col].dtype == 'datetime64[ns]']
df = df.rename(columns={converteddate[0]: 'date'})
if "date" in df.columns:
df['year_month'] = df['date'].map(lambda x: x.strftime('%Y/%m'))
如果您修改正则表达式,则匹配有效:
import pandas as pd
df = pd.DataFrame({'event_type': ['watch movie ', 'stay at home', 'swimming','camping','meeting'],
'date': ['8/11/2020', '2/13/2020', '7/04/2020','1/22/2020','7/28/2020'],
'event_mohafaza':['loc1','loc3','loc2','loc5','loc4'],
' number_person ':[24,39,20,10,33],})
print(df)
non_numeric_cols = [col for col, col_type in df.dtypes.iteritems() if col_type == 'object']
if len(non_numeric_cols) > 0:
#mask = df.astype(str).apply(lambda x : x.str.match('[0-3]?[0-9]-[0-3]?[0-9]-(?:[0-9]{2})?[0-9]{2}$').any())
mask = df.astype(str).apply(lambda x : x.str.match('^([1-9]|1[0-9]|2[0-9]|3[0-1])(.|-|/)([1-9]|1[0-2])(.|-|/)20[0-9][0-9]$').any())
if mask.any() == True:
df.loc[:,mask] = df.loc[:,mask].apply(pd.to_datetime,dayfirst=False)
converteddate = [col for col in df.columns if df[col].dtype == 'datetime64[ns]']
df = df.rename(columns={converteddate[0]: 'date'})
if "date" in df.columns:
df['year_month'] = df['date'].map(lambda x: x.strftime('%Y/%m'))
print(df)
产生
event_type date event_mohafaza number_person
0 watch movie 8/11/2020 loc1 24
1 stay at home 2/13/2020 loc3 39
2 swimming 7/04/2020 loc2 20
3 camping 1/22/2020 loc5 10
4 meeting 7/28/2020 loc4 33
event_type date event_mohafaza number_person year_month
0 watch movie 2020-08-11 loc1 24 2020/08
1 stay at home 2020-02-13 loc3 39 2020/02
2 swimming 2020-07-04 loc2 20 2020/07
3 camping 2020-01-22 loc5 10 2020/01
4 meeting 2020-07-28 loc4 33 2020/07
您不是要替换 'date' 列,而是要添加一个 'year_month' 列(我保留原样)。