Python-Pandas-Dataframe-datetime 转换不包括空值单元格
Python-Pandas-Dataframe-datetime conversion excluding null value cells
感谢您花时间看我的问题。
我尝试使用下面的函数转换 pandas 数据框中的两个日期列。我使用这个函数,因为 "Closed Date" 有 4221 行,所以它不应该在空单元格上崩溃。
最终,更改会生成原始行号的数据框。因此,我不想丢失在关闭日期具有空值的行。
数据框概览:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4272 entries, 0 to 4271
Data columns (total 4 columns):
Created Date 4272 non-null object
Closed Date 4221 non-null object
Agency 4272 non-null object
Borough 4272 non-null object
dtypes: object(4)
设计函数:
col='Closed Date'
df[(df[col].notnull())] = df[(df[col].notnull())].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
生成的错误:
TypeError Traceback (most recent call last)
<ipython-input-155-49014bb3ecb3> in <module>()
9
10 col='Closed Date'
---> 11 df[(df[col].notnull())] = df[(df[col].notnull())].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
12 print(type(df[(df[col].notnull())]))
/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4358 f, axis,
4359 reduce=reduce,
-> 4360 ignore_failures=ignore_failures)
4361 else:
4362 return self._apply_broadcast(f, axis)
/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4454 try:
4455 for i, v in enumerate(series_gen):
-> 4456 results[i] = func(v)
4457 keys.append(v.name)
4458 except Exception as e:
<ipython-input-155-49014bb3ecb3> in <lambda>(x)
9
10 col='Closed Date'
---> 11 df[(df[col].notnull())] = df[(df[col].notnull())].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
12 print(type(df[(df[col].notnull())]))
TypeError: ('strptime() argument 1 must be str, not Series', 'occurred at index Created Date')
我认为您只需要 to_datetime
- 它将 NaN
转换为 NaT
,因此所有值都是列中的日期时间:
col='Closed Date'
df[col] = pd.to_datetime(df[col], format='%m/%d/%Y %I:%M:%S %p')
样本:
df = pd.DataFrame({'Closed Date':['05/01/2016 05:10:10 AM',
'05/01/2016 05:10:10 AM',
np.nan]})
col='Closed Date'
df[col] = pd.to_datetime(df[col], format='%m/%d/%Y %I:%M:%S %p')
print (df)
Closed Date
0 2016-05-01 05:10:10
1 2016-05-01 05:10:10
2 NaT
print (df.dtypes)
Closed Date datetime64[ns]
dtype: object
感谢您花时间看我的问题。
我尝试使用下面的函数转换 pandas 数据框中的两个日期列。我使用这个函数,因为 "Closed Date" 有 4221 行,所以它不应该在空单元格上崩溃。
最终,更改会生成原始行号的数据框。因此,我不想丢失在关闭日期具有空值的行。
数据框概览:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4272 entries, 0 to 4271
Data columns (total 4 columns):
Created Date 4272 non-null object
Closed Date 4221 non-null object
Agency 4272 non-null object
Borough 4272 non-null object
dtypes: object(4)
设计函数:
col='Closed Date'
df[(df[col].notnull())] = df[(df[col].notnull())].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
生成的错误:
TypeError Traceback (most recent call last)
<ipython-input-155-49014bb3ecb3> in <module>()
9
10 col='Closed Date'
---> 11 df[(df[col].notnull())] = df[(df[col].notnull())].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
12 print(type(df[(df[col].notnull())]))
/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4358 f, axis,
4359 reduce=reduce,
-> 4360 ignore_failures=ignore_failures)
4361 else:
4362 return self._apply_broadcast(f, axis)
/anaconda/lib/python3.6/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
4454 try:
4455 for i, v in enumerate(series_gen):
-> 4456 results[i] = func(v)
4457 keys.append(v.name)
4458 except Exception as e:
<ipython-input-155-49014bb3ecb3> in <lambda>(x)
9
10 col='Closed Date'
---> 11 df[(df[col].notnull())] = df[(df[col].notnull())].apply(lambda x:datetime.datetime.strptime(x,'%m/%d/%Y %I:%M:%S %p'))
12 print(type(df[(df[col].notnull())]))
TypeError: ('strptime() argument 1 must be str, not Series', 'occurred at index Created Date')
我认为您只需要 to_datetime
- 它将 NaN
转换为 NaT
,因此所有值都是列中的日期时间:
col='Closed Date'
df[col] = pd.to_datetime(df[col], format='%m/%d/%Y %I:%M:%S %p')
样本:
df = pd.DataFrame({'Closed Date':['05/01/2016 05:10:10 AM',
'05/01/2016 05:10:10 AM',
np.nan]})
col='Closed Date'
df[col] = pd.to_datetime(df[col], format='%m/%d/%Y %I:%M:%S %p')
print (df)
Closed Date
0 2016-05-01 05:10:10
1 2016-05-01 05:10:10
2 NaT
print (df.dtypes)
Closed Date datetime64[ns]
dtype: object