从 python 中的序列列表中提取特定对象
Extract specific object from a list of sequences in python
我实现了 fpm 算法以从 activity 数据中找到规则,我有格式的输出数据。
for itemset in find_frequent_itemsets(dataset, 0.1,include_support=True):
print itemset
以上代码的输出结果如下:
([u'Global Connect Village'], 28)
([u'Terminal 2', u'Global Connect Village'], 1)
([u'VivoCity', u'Global Connect Village'], 1)
([u'Universal Studios Singapore', u'VivoCity', u'Global Connect Village'], 1)
([u'Universal Studios Singapore', u'Global Connect Village'], 2)
([u'Orchard Gateway', u'Global Connect Village'], 2)
([u'Chinatown', u'Global Connect Village'], 2)
([u'Singapore Changi Airport (SIN)', u'Chinatown', u'Global Connect Village'], 2)
([u'Fragrance Hotel', u'Global Connect Village'], 2)
([u'Singapore Changi Airport (SIN)', u'Fragrance Hotel', u'Global Connect Village'], 1)
([u'Singapore', u'Global Connect Village'], 3)
([u'Singapore Changi Airport (SIN)', u'Singapore', u'Global Connect Village'], 1)
([u"McDonald's", u'Global Connect Village'], 4)
([u'Singapore Changi Airport (SIN)', u"McDonald's", u'Global Connect Village'], 1)
我只想提取那些支持度较高且包含三个或更多对象的值。
MIN_LOCS = 3
itemset = find_frequent_itemsets(dataset, 0.1,include_support=True
itemset = sorted(filter(lambda it: len(it[0]) >= MIN_LOCS, itemset), key=lambda it: it[1])
然后您可以选择您想要的顶部元素:
itemset_top_5 = itemset[:5]
如果您想包含最小支持值,只需根据需要调整过滤即可:
itemset = sorted(filter(lambda it: len(it[0]) >= MIN_LOCS and it[1] >= MIN_SUPPORT, itemset),
key=lambda it: it[1])
我实现了 fpm 算法以从 activity 数据中找到规则,我有格式的输出数据。
for itemset in find_frequent_itemsets(dataset, 0.1,include_support=True):
print itemset
以上代码的输出结果如下:
([u'Global Connect Village'], 28)
([u'Terminal 2', u'Global Connect Village'], 1)
([u'VivoCity', u'Global Connect Village'], 1)
([u'Universal Studios Singapore', u'VivoCity', u'Global Connect Village'], 1)
([u'Universal Studios Singapore', u'Global Connect Village'], 2)
([u'Orchard Gateway', u'Global Connect Village'], 2)
([u'Chinatown', u'Global Connect Village'], 2)
([u'Singapore Changi Airport (SIN)', u'Chinatown', u'Global Connect Village'], 2)
([u'Fragrance Hotel', u'Global Connect Village'], 2)
([u'Singapore Changi Airport (SIN)', u'Fragrance Hotel', u'Global Connect Village'], 1)
([u'Singapore', u'Global Connect Village'], 3)
([u'Singapore Changi Airport (SIN)', u'Singapore', u'Global Connect Village'], 1)
([u"McDonald's", u'Global Connect Village'], 4)
([u'Singapore Changi Airport (SIN)', u"McDonald's", u'Global Connect Village'], 1)
我只想提取那些支持度较高且包含三个或更多对象的值。
MIN_LOCS = 3
itemset = find_frequent_itemsets(dataset, 0.1,include_support=True
itemset = sorted(filter(lambda it: len(it[0]) >= MIN_LOCS, itemset), key=lambda it: it[1])
然后您可以选择您想要的顶部元素:
itemset_top_5 = itemset[:5]
如果您想包含最小支持值,只需根据需要调整过滤即可:
itemset = sorted(filter(lambda it: len(it[0]) >= MIN_LOCS and it[1] >= MIN_SUPPORT, itemset),
key=lambda it: it[1])