将 pandas 中的字符串列表转换为日期列表,并相应地过滤值
Convert a list of strings in pandas into a list of date, and filter the value accordingly
现在我有一个类似如下的数据框
business_id date
0 --0r8K_AQ4FZfLsX3ZYRDA [2017-09-03 17:13:59]
1 --0zrn43LEaB4jUWTQH_Bg [2010-10-08 22:21:20, 2010-11-01 21:29:14, 2...
2 --164t1nclzzmca7eDiJMw [2010-02-26 02:06:53, 2010-02-27 08:00:09, 2...
3 --2aF9NhXnNVpDV0KS3xBQ [2014-11-03 16:35:35, 2015-01-30 18:16:03, 2...
4 --2mEJ63SC_8_08_jGgVIg [2010-12-15 17:10:46, 2013-12-28 00:27:54, 2...
... ... ...
997 -SjRCXID7eXewqloY3V86w [2015-12-13 02:48:00, 2016-01-21 22:31:31, 2...
998 -Sjrz1Mt9RY4r6ibxzGs0Q [2016-08-08 19:23:27, 2016-08-15 16:03:29, 2...
999 -Sk9ZND7V2x8RuauMH0FRw [2010-09-05 02:04:25, 2010-10-15 22:48:00, 2...
1000 -SkNedh2bJHPOcKfoFlTvg [2013-09-01 02:54:45, 2013-10-22 16:59:13, 2...
1001 -SkwKPbo5oK1-NtKkupNvw [2010-09-11 20:23:45, 2011-05-26 16:24:35, 2...
我想做的是
- 将列表中的所有值转换为日期
- 过滤仅晚于2018-01-01的值
第一步,我尝试做的是使用一个应用函数,这样我就可以覆盖列表中的所有元素:
def convert_to_date(d):
pd.to_datetime(d, format='%Y-%m-%d %H:%M:%S')
checkin_data['date'].apply(convert_to_date)
然而,结果是这样的
0 None
1 None
2 None
3 None
4 None
...
997 None
998 None
999 None
1000 None
1001 None
Name: date, Length: 1002, dtype: object
我该如何解决?
感谢您的帮助!
添加 return
以避免缺失值并在 boolean indexing
中过滤更大的值:
print (checkin_data)
business_id date
0 --0r8K_AQ4FZfLsX3ZYRDA [2022-09-03 17:13:59]
1 --0zrn43LEaB4jUWTQH_Bg [2018-10-08 22:21:20, 2010-11-01 21:29:14]
2 --164t1nclzzmca7eDiJMw [2019-02-26 02:06:53, 2030-02-27 08:00:09]
3 --2aF9NhXnNVpDV0KS3xBQ [2014-11-03 16:35:35, 2015-01-30 18:16:03]
4 --2mEJ63SC_8_08_jGgVIg [2010-12-15 17:10:46, 2013-12-28 00:27:54]
def convert_to_date(d):
x = pd.to_datetime(d)
return x[x > '2018-01-01'].tolist()
checkin_data['date'] = checkin_data['date'].apply(convert_to_date)
print (checkin_data)
business_id date
0 --0r8K_AQ4FZfLsX3ZYRDA [2022-09-03 17:13:59]
1 --0zrn43LEaB4jUWTQH_Bg [2018-10-08 22:21:20]
2 --164t1nclzzmca7eDiJMw [2019-02-26 02:06:53, 2030-02-27 08:00:09]
3 --2aF9NhXnNVpDV0KS3xBQ []
4 --2mEJ63SC_8_08_jGgVIg []
现在我有一个类似如下的数据框
business_id date
0 --0r8K_AQ4FZfLsX3ZYRDA [2017-09-03 17:13:59]
1 --0zrn43LEaB4jUWTQH_Bg [2010-10-08 22:21:20, 2010-11-01 21:29:14, 2...
2 --164t1nclzzmca7eDiJMw [2010-02-26 02:06:53, 2010-02-27 08:00:09, 2...
3 --2aF9NhXnNVpDV0KS3xBQ [2014-11-03 16:35:35, 2015-01-30 18:16:03, 2...
4 --2mEJ63SC_8_08_jGgVIg [2010-12-15 17:10:46, 2013-12-28 00:27:54, 2...
... ... ...
997 -SjRCXID7eXewqloY3V86w [2015-12-13 02:48:00, 2016-01-21 22:31:31, 2...
998 -Sjrz1Mt9RY4r6ibxzGs0Q [2016-08-08 19:23:27, 2016-08-15 16:03:29, 2...
999 -Sk9ZND7V2x8RuauMH0FRw [2010-09-05 02:04:25, 2010-10-15 22:48:00, 2...
1000 -SkNedh2bJHPOcKfoFlTvg [2013-09-01 02:54:45, 2013-10-22 16:59:13, 2...
1001 -SkwKPbo5oK1-NtKkupNvw [2010-09-11 20:23:45, 2011-05-26 16:24:35, 2...
我想做的是
- 将列表中的所有值转换为日期
- 过滤仅晚于2018-01-01的值
第一步,我尝试做的是使用一个应用函数,这样我就可以覆盖列表中的所有元素:
def convert_to_date(d):
pd.to_datetime(d, format='%Y-%m-%d %H:%M:%S')
checkin_data['date'].apply(convert_to_date)
然而,结果是这样的
0 None
1 None
2 None
3 None
4 None
...
997 None
998 None
999 None
1000 None
1001 None
Name: date, Length: 1002, dtype: object
我该如何解决? 感谢您的帮助!
添加 return
以避免缺失值并在 boolean indexing
中过滤更大的值:
print (checkin_data)
business_id date
0 --0r8K_AQ4FZfLsX3ZYRDA [2022-09-03 17:13:59]
1 --0zrn43LEaB4jUWTQH_Bg [2018-10-08 22:21:20, 2010-11-01 21:29:14]
2 --164t1nclzzmca7eDiJMw [2019-02-26 02:06:53, 2030-02-27 08:00:09]
3 --2aF9NhXnNVpDV0KS3xBQ [2014-11-03 16:35:35, 2015-01-30 18:16:03]
4 --2mEJ63SC_8_08_jGgVIg [2010-12-15 17:10:46, 2013-12-28 00:27:54]
def convert_to_date(d):
x = pd.to_datetime(d)
return x[x > '2018-01-01'].tolist()
checkin_data['date'] = checkin_data['date'].apply(convert_to_date)
print (checkin_data)
business_id date
0 --0r8K_AQ4FZfLsX3ZYRDA [2022-09-03 17:13:59]
1 --0zrn43LEaB4jUWTQH_Bg [2018-10-08 22:21:20]
2 --164t1nclzzmca7eDiJMw [2019-02-26 02:06:53, 2030-02-27 08:00:09]
3 --2aF9NhXnNVpDV0KS3xBQ []
4 --2mEJ63SC_8_08_jGgVIg []