计算 timedelta 列表的中位数(或均值)
calculating the median (or mean) of a timedelta list
我正在尝试查找从 PANDAS 数据帧生成的 timeDelta 对象列表的中位数。我试过这样使用统计库:
newList= list(DF.sort_values(['TimeDelta'])['TimeDelta'])
TDmedian = (st.median(newList))
st 是我导入的统计库。
但是我得到错误:
`TypeError: unsupported operand type(s) for /: 'str' and 'int'`
我试着做一个函数来计算它:
`
def date_median(date_list):
length = len(date_list)
print(length)
//Checks if the length is odd cause median in odd numbered lists is the middle value
if length % 2 != 0:
return date_list[length//2]
else:
//If it's even, it'll take the middle value and the one above it and generate the mean
print((length//2), (length//2+1))
lower = date_list[length//2]
upper = date_list[(length//2) +1]
return (lower + upper)/2`
我是这样使用的:
`TAmedian = date_median(newList)`
我得到这个错误:
`TypeError: unsupported operand type(s) for /: 'str' and 'int'`
有没有更简单的方法来做到这一点,如果没有,那我做错了什么?
示例数据:
DF['TimeDelta'] = [0 days 00:00:36.35700000,0 days 00:47:11.213000000]
好的。它应该 有效。著名的遗言吧?
我怀疑您的数据框的那一列中有一些元素不是数字。它应该像这样工作:
In [17]: import pandas as pd
In [18]: tds = [timedelta(t) for t in range(5)]
In [19]: x = list(range(5))
In [20]: df = pd.DataFrame({'x': x, 'time delta': tds})
In [21]: df
Out[21]:
x time delta
0 0 0 days
1 1 1 days
2 2 2 days
3 3 3 days
4 4 4 days
In [22]: import numpy as np
In [23]: np.median(df['time delta'])
Out[23]: numpy.timedelta64(172800000000000,'ns')
那么,您是否测试过数据框以查看列中是否有一些非数值?最简单的就是使用 info()
命令。它看起来应该与此类似。如果显示"Object",你需要找出原因。
In [24]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 x 5 non-null int64
1 time delta 5 non-null timedelta64[ns]
dtypes: int64(1), timedelta64[ns](1)
memory usage: 208.0 bytes
In [25]: df.describe()
Out[25]:
x time delta
count 5.000000 5
mean 2.000000 2 days 00:00:00
std 1.581139 1 days 13:56:50.394919
min 0.000000 0 days 00:00:00
25% 1.000000 1 days 00:00:00
50% 2.000000 2 days 00:00:00
75% 3.000000 3 days 00:00:00
max 4.000000 4 days 00:00:00
这是关于寻找非数字值的好post:
Finding non-numeric rows in dataframe in pandas?
为什么要转换为 list
? pandas.DataFrame
有你需要的一切:
import pandas as pd
DF = pd.DataFrame({'TimeDelta': pd.to_timedelta(['0 days 00:00:36.35700000',
'0 days 00:47:11.213000000'])})
DF['TimeDelta'].mean()
# Timedelta('0 days 00:23:53.785000')
DF['TimeDelta'].median()
# Timedelta('0 days 00:23:53.785000')
当然,如果你一开始就没有df,你也可以不用,比如
pd.to_timedelta(['0 days 00:00:36.35700000', '0 days 00:47:11.213000000']).median()
我正在尝试查找从 PANDAS 数据帧生成的 timeDelta 对象列表的中位数。我试过这样使用统计库:
newList= list(DF.sort_values(['TimeDelta'])['TimeDelta'])
TDmedian = (st.median(newList))
st 是我导入的统计库。
但是我得到错误:
`TypeError: unsupported operand type(s) for /: 'str' and 'int'`
我试着做一个函数来计算它: `
def date_median(date_list):
length = len(date_list)
print(length)
//Checks if the length is odd cause median in odd numbered lists is the middle value
if length % 2 != 0:
return date_list[length//2]
else:
//If it's even, it'll take the middle value and the one above it and generate the mean
print((length//2), (length//2+1))
lower = date_list[length//2]
upper = date_list[(length//2) +1]
return (lower + upper)/2`
我是这样使用的:
`TAmedian = date_median(newList)`
我得到这个错误:
`TypeError: unsupported operand type(s) for /: 'str' and 'int'`
有没有更简单的方法来做到这一点,如果没有,那我做错了什么?
示例数据:
DF['TimeDelta'] = [0 days 00:00:36.35700000,0 days 00:47:11.213000000]
好的。它应该 有效。著名的遗言吧?
我怀疑您的数据框的那一列中有一些元素不是数字。它应该像这样工作:
In [17]: import pandas as pd
In [18]: tds = [timedelta(t) for t in range(5)]
In [19]: x = list(range(5))
In [20]: df = pd.DataFrame({'x': x, 'time delta': tds})
In [21]: df
Out[21]:
x time delta
0 0 0 days
1 1 1 days
2 2 2 days
3 3 3 days
4 4 4 days
In [22]: import numpy as np
In [23]: np.median(df['time delta'])
Out[23]: numpy.timedelta64(172800000000000,'ns')
那么,您是否测试过数据框以查看列中是否有一些非数值?最简单的就是使用 info()
命令。它看起来应该与此类似。如果显示"Object",你需要找出原因。
In [24]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 x 5 non-null int64
1 time delta 5 non-null timedelta64[ns]
dtypes: int64(1), timedelta64[ns](1)
memory usage: 208.0 bytes
In [25]: df.describe()
Out[25]:
x time delta
count 5.000000 5
mean 2.000000 2 days 00:00:00
std 1.581139 1 days 13:56:50.394919
min 0.000000 0 days 00:00:00
25% 1.000000 1 days 00:00:00
50% 2.000000 2 days 00:00:00
75% 3.000000 3 days 00:00:00
max 4.000000 4 days 00:00:00
这是关于寻找非数字值的好post:
Finding non-numeric rows in dataframe in pandas?
为什么要转换为 list
? pandas.DataFrame
有你需要的一切:
import pandas as pd
DF = pd.DataFrame({'TimeDelta': pd.to_timedelta(['0 days 00:00:36.35700000',
'0 days 00:47:11.213000000'])})
DF['TimeDelta'].mean()
# Timedelta('0 days 00:23:53.785000')
DF['TimeDelta'].median()
# Timedelta('0 days 00:23:53.785000')
当然,如果你一开始就没有df,你也可以不用,比如
pd.to_timedelta(['0 days 00:00:36.35700000', '0 days 00:47:11.213000000']).median()