清理日期和时间记录在pythonpandas

Cleaning date and time records in python pandas

在下面的数据中,日期和时间位于不同的列中,我将它们组合在一起以获得完整的 date-time,因此结果列的类型为 'datetime64[ns]'。然而,有时会有日期和时间为空的记录,在这种情况下,结果列的类型为 'object',本质上是一个字符串对象。当所有记录都存在和不存在时,我该如何处理?

SAMPLE DATA

CARD,IN Date,IN Time,OUT Date,OUT Time
100001,30-04-2015,14:19:18,01-05-2015,00:10:56
100002,30-04-2015,11:27:52,,
100003,30-04-2015,17:59:47,01-05-2015,04:51:52
100004,30-04-2015,16:15:25,,
100005,30-04-2015,10:25:13,01-05-2015,01:25:13
100006,30-04-2015,16:59:10,,
100007,30-04-2015,13:22:06,,
100008,30-04-2015,09:15:29,,
100009,30-04-2015,17:01:10,01-05-2015,01:51:01
100010,30-04-2015,13:13:30,01-05-2015,01:37:28
100011,30-04-2015,09:37:28,01-05-2015,00:37:28
100012,30-04-2015,18:55:44,01-05-2015,03:22:22
100013,30-04-2015,14:28:16,01-05-2015,01:27:18
100014,30-04-2015,09:02:13,01-05-2015,00:02:13
100015,30-04-2015,09:04:10,01-05-2015,00:04:10
100016,30-04-2015,18:51:56,01-05-2015,09:51:56
100017,30-04-2015,09:12:51,01-05-2015,00:12:51
100018,30-04-2015,10:40:31,01-05-2015,01:40:31
100019,30-04-2015,10:35:56,01-05-2015,01:35:56
100020,30-04-2015,17:50:03,01-05-2015,03:54:54
100021,30-04-2015,17:00:16,01-05-2015,02:45:35
100022,30-04-2015,11:18:41,01-05-2015,01:15:52

Following is the code I've for now:

import numpy as np
import pandas as pd
from datetime import datetime

#CARD,IN Date,IN Time,OUT Date,OUT Time     
data = pd.read_csv('DATA.csv', parse_dates=[['IN Date','IN Time'],['OUT Date','OUT Time'],'IN Date','OUT Date'], keep_date_col=True)
data.rename(columns={'IN Date_IN Time':'IN','OUT Date_OUT Time':'OUT'}, inplace=True)
data = data[['CARD','IN Date', 'IN', 'OUT Date', 'OUT']]
#This line will fail when all the records are present
data.ix[(data.OUT == 'nan nan'), 'OUT'] = np.nan

我想你可以试试str.contains:

data.ix[(data.OUT.str.contains('nan')), 'OUT'] = np.nan

但最好是使用 to_datetime 和参数 errors='coerce':

data['OUT'] = pd.to_datetime(data['OUT'], errors='coerce')
print data
      CARD    IN Date                  IN   OUT Date                 OUT
0   100001 2015-04-30 2015-04-30 14:19:18 2015-01-05 2015-01-05 00:10:56
1   100002 2015-04-30 2015-04-30 11:27:52        NaT                 NaT
2   100003 2015-04-30 2015-04-30 17:59:47 2015-01-05 2015-01-05 04:51:52
3   100004 2015-04-30 2015-04-30 16:15:25        NaT                 NaT
4   100005 2015-04-30 2015-04-30 10:25:13 2015-01-05 2015-01-05 01:25:13
5   100006 2015-04-30 2015-04-30 16:59:10        NaT                 NaT
6   100007 2015-04-30 2015-04-30 13:22:06        NaT                 NaT
7   100008 2015-04-30 2015-04-30 09:15:29        NaT                 NaT
8   100009 2015-04-30 2015-04-30 17:01:10 2015-01-05 2015-01-05 01:51:01
9   100010 2015-04-30 2015-04-30 13:13:30 2015-01-05 2015-01-05 01:37:28
10  100011 2015-04-30 2015-04-30 09:37:28 2015-01-05 2015-01-05 00:37:28
11  100012 2015-04-30 2015-04-30 18:55:44 2015-01-05 2015-01-05 03:22:22
12  100013 2015-04-30 2015-04-30 14:28:16 2015-01-05 2015-01-05 01:27:18
13  100014 2015-04-30 2015-04-30 09:02:13 2015-01-05 2015-01-05 00:02:13
14  100015 2015-04-30 2015-04-30 09:04:10 2015-01-05 2015-01-05 00:04:10
15  100016 2015-04-30 2015-04-30 18:51:56 2015-01-05 2015-01-05 09:51:56
16  100017 2015-04-30 2015-04-30 09:12:51 2015-01-05 2015-01-05 00:12:51
17  100018 2015-04-30 2015-04-30 10:40:31 2015-01-05 2015-01-05 01:40:31
18  100019 2015-04-30 2015-04-30 10:35:56 2015-01-05 2015-01-05 01:35:56
19  100020 2015-04-30 2015-04-30 17:50:03 2015-01-05 2015-01-05 03:54:54
20  100021 2015-04-30 2015-04-30 17:00:16 2015-01-05 2015-01-05 02:45:35
21  100022 2015-04-30 2015-04-30 11:18:41 2015-01-05 2015-01-05 01:15:52