如何使用列表中的相同 id 元素处理日期差异
How to process a date diference using same id elements in a list
我有以下数据结构:
[ (19L, datetime.datetime(2015, 2, 11, 12, 3, 43)),
(19L, datetime.datetime(2015, 2, 12, 16, 28, 48)),
(19L, datetime.datetime(2014, 9, 17, 11, 58, 19)),
(80L, datetime.datetime(2014, 9, 15, 12, 54, 36)),
(80L, datetime.datetime(2014, 9, 15, 14, 16, 39)),
(80L, datetime.datetime(2014, 2, 6, 8, 58, 39)),
(80L, datetime.datetime(2014, 9, 8, 14, 21, 48)),
(90L, datetime.datetime(2016, 8, 2, 18, 14, 31)),
(90L, datetime.datetime(2016, 8, 2, 21, 14, 23)),
(90L, datetime.datetime(2014, 1, 5, 16, 35, 34)) ]
我需要计算具有相同 ID 的用户之间的平均天数,第一个元素对应于用户 ID,第二个元素对应于日期时间。
我在如何遍历列表、计算每个用户并获得相同的差异方面遇到了麻烦...
您可以使用 itertools.groupby()
按用户 ID 分组(假设列表按分组键排序 - 看起来是这样),然后,您可以使用 "pairwise" 迭代并计算一个平均日差:
In [1]: import datetime
In [2]: from operator import itemgetter
In [3]: from itertools import groupby, combinations
In [4]: l = [
...: (19L, datetime.datetime(2015, 2, 11, 12, 3, 43)),
...: (19L, datetime.datetime(2015, 2, 12, 16, 28, 48)),
...: (19L, datetime.datetime(2014, 9, 17, 11, 58, 19)),
...: (80L, datetime.datetime(2014, 9, 15, 12, 54, 36)),
...: (80L, datetime.datetime(2014, 9, 15, 14, 16, 39)),
...: (80L, datetime.datetime(2014, 2, 6, 8, 58, 39)),
...: (80L, datetime.datetime(2014, 9, 8, 14, 21, 48)),
...: (90L, datetime.datetime(2016, 8, 2, 18, 14, 31)),
...: (90L, datetime.datetime(2016, 8, 2, 21, 14, 23)),
...: (90L, datetime.datetime(2014, 1, 5, 16, 35, 34)) ]
In [5]: for user_id, dates in groupby(l, itemgetter(0)):
...: dates = [date[1] for date in dates]
...: differences = [abs((d1 - d2).days) for d1, d2 in zip(dates[0::2], dates[1::2])]
...: print(user_id, sum(differences) / len(differences))
...:
(19L, 2)
(80L, 108)
(90L, 1)
我会将时间戳排序到字典中,其中每个键都是用户的 ID,值是访问时间的列表。然后在对时间戳列表进行排序后,找到每次访问时间之间的差异并找到平均值。 datetime.timedelta
对象可用于简化时间戳的数学运算..
from collections import defaultdict
from datetime import datetime
#l = [(id, datetime), (...), ...]
d = defaultdict(list)
for ID, time in l:
d[ID].append(time) # build list of times from timestamps
d[ID].sort() # sorting every time is not optimal but functional
for ID in d.keys():
timeDeltas = [d[ID][i+1] - d[ID][i] for i in range(len(d[ID])-1)] # create list of timedeltas
averageVisitFrequency = reduce(lambda x,y: x+y, timeDeltas)//len(timeDeltas) # calculate average timedelta
print 'user {} makes a purchase every {} days on average'.format(ID, averageVisitFrequency.days) # example output usage
我有以下数据结构:
[ (19L, datetime.datetime(2015, 2, 11, 12, 3, 43)),
(19L, datetime.datetime(2015, 2, 12, 16, 28, 48)),
(19L, datetime.datetime(2014, 9, 17, 11, 58, 19)),
(80L, datetime.datetime(2014, 9, 15, 12, 54, 36)),
(80L, datetime.datetime(2014, 9, 15, 14, 16, 39)),
(80L, datetime.datetime(2014, 2, 6, 8, 58, 39)),
(80L, datetime.datetime(2014, 9, 8, 14, 21, 48)),
(90L, datetime.datetime(2016, 8, 2, 18, 14, 31)),
(90L, datetime.datetime(2016, 8, 2, 21, 14, 23)),
(90L, datetime.datetime(2014, 1, 5, 16, 35, 34)) ]
我需要计算具有相同 ID 的用户之间的平均天数,第一个元素对应于用户 ID,第二个元素对应于日期时间。
我在如何遍历列表、计算每个用户并获得相同的差异方面遇到了麻烦...
您可以使用 itertools.groupby()
按用户 ID 分组(假设列表按分组键排序 - 看起来是这样),然后,您可以使用 "pairwise" 迭代并计算一个平均日差:
In [1]: import datetime
In [2]: from operator import itemgetter
In [3]: from itertools import groupby, combinations
In [4]: l = [
...: (19L, datetime.datetime(2015, 2, 11, 12, 3, 43)),
...: (19L, datetime.datetime(2015, 2, 12, 16, 28, 48)),
...: (19L, datetime.datetime(2014, 9, 17, 11, 58, 19)),
...: (80L, datetime.datetime(2014, 9, 15, 12, 54, 36)),
...: (80L, datetime.datetime(2014, 9, 15, 14, 16, 39)),
...: (80L, datetime.datetime(2014, 2, 6, 8, 58, 39)),
...: (80L, datetime.datetime(2014, 9, 8, 14, 21, 48)),
...: (90L, datetime.datetime(2016, 8, 2, 18, 14, 31)),
...: (90L, datetime.datetime(2016, 8, 2, 21, 14, 23)),
...: (90L, datetime.datetime(2014, 1, 5, 16, 35, 34)) ]
In [5]: for user_id, dates in groupby(l, itemgetter(0)):
...: dates = [date[1] for date in dates]
...: differences = [abs((d1 - d2).days) for d1, d2 in zip(dates[0::2], dates[1::2])]
...: print(user_id, sum(differences) / len(differences))
...:
(19L, 2)
(80L, 108)
(90L, 1)
我会将时间戳排序到字典中,其中每个键都是用户的 ID,值是访问时间的列表。然后在对时间戳列表进行排序后,找到每次访问时间之间的差异并找到平均值。 datetime.timedelta
对象可用于简化时间戳的数学运算..
from collections import defaultdict
from datetime import datetime
#l = [(id, datetime), (...), ...]
d = defaultdict(list)
for ID, time in l:
d[ID].append(time) # build list of times from timestamps
d[ID].sort() # sorting every time is not optimal but functional
for ID in d.keys():
timeDeltas = [d[ID][i+1] - d[ID][i] for i in range(len(d[ID])-1)] # create list of timedeltas
averageVisitFrequency = reduce(lambda x,y: x+y, timeDeltas)//len(timeDeltas) # calculate average timedelta
print 'user {} makes a purchase every {} days on average'.format(ID, averageVisitFrequency.days) # example output usage