Pandas Python - 将 HH:MM:SS 转换为总秒数(csv 文件)
Pandas Python - convert HH:MM:SS into seconds in aggegate (csv file)
我正在尝试将 'Avg. Session Duration'(HH:MM:SS) 列中的数字转换为 Pandas read_csv
module/function 中的整数(以秒为单位) .
例如,'0:03:26' 将是转换后的 206 秒。
输入示例:
Source Month Sessions Bounce Rate Avg. Session Duration
ABC.com 201501 408 26.47% 0:03:26
EFG.com 201412 398 31.45% 0:04:03
我写了一个函数:
def time_convert(x):
times = x.split(':')
return (60*int(times[0])+60*int(times[1]))+int(times[2])
只要将“0:03:26”传递给该函数,该函数就可以正常工作。但是当我试图通过将函数应用于 Pandas 中的另一列来创建新列 'Duration' 时,
df = pd.read_csv('myfile.csv')
df['Duration'] = df['Avg. Session Duration'].apply(time_convert)
它返回了一条错误消息:
> --------------------------------------------------------------------------- AttributeError Traceback (most recent call
> last) <ipython-input-53-01e79de1cb39> in <module>()
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
>
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc
> in apply(self, func, convert_dtype, args, **kwds) 1991
> values = lib.map_infer(values, lib.Timestamp) 1992
> -> 1993 mapped = lib.map_infer(values, f, convert=convert_dtype) 1994 if len(mapped) and
> isinstance(mapped[0], Series): 1995 from
> pandas.core.frame import DataFrame
>
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/lib.so in
> pandas.lib.map_infer (pandas/lib.c:52281)()
>
> <ipython-input-53-01e79de1cb39> in <lambda>(x)
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
>
> AttributeError: 'float' object has no attribute 'split'
我不知道为什么它说 'Avg. Session Duration' 的值是浮点数。
Data columns (total 7 columns):
Source 250 non-null object
Time 251 non-null object
Sessions 188 non-null object
Users 188 non-null object
Bounce Rate 188 non-null object
Avg. Session Duration 188 non-null object
% New Sessions 188 non-null object
dtypes: object(7)
谁能帮我找出问题所在?
df['Avg. Session Duration']
应该是让您的函数工作的字符串。
df =pd.DataFrame({'time':['0:03:26']})
def time_convert(x):
h,m,s = map(int,x.split(':'))
return (h*60+m)*60+s
df.time.apply(time_convert)
这对我来说很好。
您可以使用来自标准 python 库的 time
和 datetime
将时间转换为秒:
import time, datetime
def convertTime(t):
x = time.strptime(t,'%H:%M:%S')
return str(int(datetime.timedelta(hours=x.tm_hour,minutes=x.tm_min,seconds=x.tm_sec).total_seconds()))
convertTime('0:03:26') # Output 206.0
convertTime('0:04:03') # Output 243.0
错误意味着该列被识别为浮点数,而不是字符串。修复读取数据的方式,例如:
#!/usr/bin/env python
import sys
import pandas
def hh_mm_ss2seconds(hh_mm_ss):
return reduce(lambda acc, x: acc*60 + x, map(int, hh_mm_ss.split(':')))
df = pandas.read_csv('input.csv', sep=r'\s{2,}',
converters={'Avg. Session Duration': hh_mm_ss2seconds})
print(df)
输出
Source Month Sessions Bounce Rate Avg. Session Duration
0 ABC.com 201501 408 26.47% 206
1 EFG.com 201412 398 31.45% 243
[2 rows x 5 columns]
我正在尝试将 'Avg. Session Duration'(HH:MM:SS) 列中的数字转换为 Pandas read_csv
module/function 中的整数(以秒为单位) .
例如,'0:03:26' 将是转换后的 206 秒。
输入示例:
Source Month Sessions Bounce Rate Avg. Session Duration
ABC.com 201501 408 26.47% 0:03:26
EFG.com 201412 398 31.45% 0:04:03
我写了一个函数:
def time_convert(x):
times = x.split(':')
return (60*int(times[0])+60*int(times[1]))+int(times[2])
只要将“0:03:26”传递给该函数,该函数就可以正常工作。但是当我试图通过将函数应用于 Pandas 中的另一列来创建新列 'Duration' 时,
df = pd.read_csv('myfile.csv')
df['Duration'] = df['Avg. Session Duration'].apply(time_convert)
它返回了一条错误消息:
> --------------------------------------------------------------------------- AttributeError Traceback (most recent call
> last) <ipython-input-53-01e79de1cb39> in <module>()
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
>
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/core/series.pyc
> in apply(self, func, convert_dtype, args, **kwds) 1991
> values = lib.map_infer(values, lib.Timestamp) 1992
> -> 1993 mapped = lib.map_infer(values, f, convert=convert_dtype) 1994 if len(mapped) and
> isinstance(mapped[0], Series): 1995 from
> pandas.core.frame import DataFrame
>
> /Users/yumiyang/anaconda/lib/python2.7/site-packages/pandas/lib.so in
> pandas.lib.map_infer (pandas/lib.c:52281)()
>
> <ipython-input-53-01e79de1cb39> in <lambda>(x)
> ----> 1 df['Avg. Session Duration'] = df['Avg. Session Duration'].apply(lambda x: x.split(':'))
>
> AttributeError: 'float' object has no attribute 'split'
我不知道为什么它说 'Avg. Session Duration' 的值是浮点数。
Data columns (total 7 columns):
Source 250 non-null object
Time 251 non-null object
Sessions 188 non-null object
Users 188 non-null object
Bounce Rate 188 non-null object
Avg. Session Duration 188 non-null object
% New Sessions 188 non-null object
dtypes: object(7)
谁能帮我找出问题所在?
df['Avg. Session Duration']
应该是让您的函数工作的字符串。
df =pd.DataFrame({'time':['0:03:26']})
def time_convert(x):
h,m,s = map(int,x.split(':'))
return (h*60+m)*60+s
df.time.apply(time_convert)
这对我来说很好。
您可以使用来自标准 python 库的 time
和 datetime
将时间转换为秒:
import time, datetime
def convertTime(t):
x = time.strptime(t,'%H:%M:%S')
return str(int(datetime.timedelta(hours=x.tm_hour,minutes=x.tm_min,seconds=x.tm_sec).total_seconds()))
convertTime('0:03:26') # Output 206.0
convertTime('0:04:03') # Output 243.0
错误意味着该列被识别为浮点数,而不是字符串。修复读取数据的方式,例如:
#!/usr/bin/env python
import sys
import pandas
def hh_mm_ss2seconds(hh_mm_ss):
return reduce(lambda acc, x: acc*60 + x, map(int, hh_mm_ss.split(':')))
df = pandas.read_csv('input.csv', sep=r'\s{2,}',
converters={'Avg. Session Duration': hh_mm_ss2seconds})
print(df)
输出
Source Month Sessions Bounce Rate Avg. Session Duration
0 ABC.com 201501 408 26.47% 206
1 EFG.com 201412 398 31.45% 243
[2 rows x 5 columns]