下面的数据frame.its一个840万用户的数据如何区分年月日
How to seperate date, month and year from the following data frame.its a data of 8.4 Million users
我试过使用DatetimeIndex
方法。
有数值的栏目如下
reg_date
2013-06-10T00:00:00.000Z
2014-09-30T00:00:00.000Z
2014-09-30T00:00:00.000Z
2014-09-30T00:00:00.000Z
2014-10-01T00:00:00.000Z
type(df.reg_date) yields
pandas.core.series.Series
并使用了以下
df['reg_month'] = pd.DatetimeIndex(df['reg_date']).month
我为较早的数据得到了这个,但是DatetimeIndex
在这里不起作用
并出现以下错误
TypeError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
302 try:
--> 303 values, tz = tslib.datetime_to_datetime64(arg)
304 return DatetimeIndex._simple_new(values, name=name, tz=tz)
pandas/_libs/tslib.pyx in pandas._libs.tslib.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-22-4e7ef5ca2997> in <module>()
----> 1 df['reg_month'] = pd.DatetimeIndex(df['reg_date']).month
C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
116 else:
117 kwargs[new_arg_name] = new_arg_value
--> 118 return func(*args, **kwargs)
119 return wrapper
120 return _deprecate_kwarg
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
340 is_integer_dtype(data)):
341 data = tools.to_datetime(data, dayfirst=dayfirst,
--> 342 yearfirst=yearfirst)
343
344 if issubclass(data.dtype.type, np.datetime64) or is_datetimetz(data):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
378 result = _convert_listlike(arg, box, format, name=arg.name)
379 elif is_list_like(arg):
--> 380 result = _convert_listlike(arg, box, format)
381 else:
382 result = _convert_listlike(np.array([arg]), box, format)[0]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
304 return DatetimeIndex._simple_new(values, name=name, tz=tz)
305 except (ValueError, TypeError):
--> 306 raise e
307
308 if arg is None:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
292 dayfirst=dayfirst,
293 yearfirst=yearfirst,
--> 294 require_iso8601=require_iso8601
295 )
296
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser.py in parse(timestr, parserinfo, **kwargs)
1180 return parser(parserinfo).parse(timestr, **kwargs)
1181 else:
-> 1182 return DEFAULTPARSER.parse(timestr, **kwargs)
1183
1184
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
557
558 if res is None:
--> 559 raise ValueError("Unknown string format")
560
561 if len(res) == 0:
ValueError: Unknown string format
您可以将数据转换为日期时间对象:
import datetime as dt
df['reg_date'] = pd.to_datetime(df['reg_date'], errors='coerce')
然后你可以提取月份如下:
df['month'] = df['reg_date'].dt.month
输出:
time month
0 2013-06-10 6
1 2014-09-30 9
2 2014-09-30 9
3 2014-09-30 9
4 2014-10-01 10
Here 是文档。
import pandas as pd
n = {"year":[], "month":[], "day":[]}
for i in df['reg_date']:
n["year"].append(i.split("T")[0].split("-")[0])
n["month"].append(i.split("T")[0].split("-")[1])
n["day"].append(i.split("T")[0].split("-")[2])
#Now 'n' is the dictionary contains separated day, month and year from df["reg_date"]..
Another approach
df["reg_date"] = df["reg_date"].apply(lambda x: x.split("T")[0])
#Here df["reg_date"] converts to column containing date for each records
我试过使用DatetimeIndex
方法。
有数值的栏目如下
reg_date
2013-06-10T00:00:00.000Z
2014-09-30T00:00:00.000Z
2014-09-30T00:00:00.000Z
2014-09-30T00:00:00.000Z
2014-10-01T00:00:00.000Z
type(df.reg_date) yields
pandas.core.series.Series
并使用了以下
df['reg_month'] = pd.DatetimeIndex(df['reg_date']).month
我为较早的数据得到了这个,但是DatetimeIndex
在这里不起作用
并出现以下错误
TypeError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
302 try:
--> 303 values, tz = tslib.datetime_to_datetime64(arg)
304 return DatetimeIndex._simple_new(values, name=name, tz=tz)
pandas/_libs/tslib.pyx in pandas._libs.tslib.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-22-4e7ef5ca2997> in <module>()
----> 1 df['reg_month'] = pd.DatetimeIndex(df['reg_date']).month
C:\ProgramData\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
116 else:
117 kwargs[new_arg_name] = new_arg_value
--> 118 return func(*args, **kwargs)
119 return wrapper
120 return _deprecate_kwarg
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs)
340 is_integer_dtype(data)):
341 data = tools.to_datetime(data, dayfirst=dayfirst,
--> 342 yearfirst=yearfirst)
343
344 if issubclass(data.dtype.type, np.datetime64) or is_datetimetz(data):
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin)
378 result = _convert_listlike(arg, box, format, name=arg.name)
379 elif is_list_like(arg):
--> 380 result = _convert_listlike(arg, box, format)
381 else:
382 result = _convert_listlike(np.array([arg]), box, format)[0]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
304 return DatetimeIndex._simple_new(values, name=name, tz=tz)
305 except (ValueError, TypeError):
--> 306 raise e
307
308 if arg is None:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\tools\datetimes.py in _convert_listlike(arg, box, format, name, tz)
292 dayfirst=dayfirst,
293 yearfirst=yearfirst,
--> 294 require_iso8601=require_iso8601
295 )
296
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslibs/parsing.pyx in pandas._libs.tslibs.parsing.parse_datetime_string()
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser.py in parse(timestr, parserinfo, **kwargs)
1180 return parser(parserinfo).parse(timestr, **kwargs)
1181 else:
-> 1182 return DEFAULTPARSER.parse(timestr, **kwargs)
1183
1184
C:\ProgramData\Anaconda3\lib\site-packages\dateutil\parser.py in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
557
558 if res is None:
--> 559 raise ValueError("Unknown string format")
560
561 if len(res) == 0:
ValueError: Unknown string format
您可以将数据转换为日期时间对象:
import datetime as dt
df['reg_date'] = pd.to_datetime(df['reg_date'], errors='coerce')
然后你可以提取月份如下:
df['month'] = df['reg_date'].dt.month
输出:
time month
0 2013-06-10 6
1 2014-09-30 9
2 2014-09-30 9
3 2014-09-30 9
4 2014-10-01 10
Here 是文档。
import pandas as pd
n = {"year":[], "month":[], "day":[]}
for i in df['reg_date']:
n["year"].append(i.split("T")[0].split("-")[0])
n["month"].append(i.split("T")[0].split("-")[1])
n["day"].append(i.split("T")[0].split("-")[2])
#Now 'n' is the dictionary contains separated day, month and year from df["reg_date"]..
Another approach
df["reg_date"] = df["reg_date"].apply(lambda x: x.split("T")[0])
#Here df["reg_date"] converts to column containing date for each records