Numpy 从另一列中找到每个值最常见的项目
Numpy finds most common item per value from another column
我正在使用 kMeans AI 来确定任何一天的季节
为此,我有一个包含 4 列数据的数组
这是它的样子(虽然它更长):
['0.2742330338168506' '0' '1.3694492732480696' 'winter']
['0.28529288153011745' '0' '1.3805091209613365' 'lente']
['0.28595917620794253' '1' '1.3811754156391616' 'winter']
['0.2874392369724381' '2' '1.3826554764036572' 'lente']
['0.316557712713994' '2' '1.411773952145213' 'herfst']
['0.32113534393276466' '3 '1.4163515833639837' 'lente']
['0.3231108855082745' '3' '1.4220488660040091' 'lente']
['0.3163219663513872' '3' '1.4288377851608964' 'winter']
['0.31201423701381703' '4' '1.4331455144984666' 'lente']
['0.3081781460867783' '4' '1.4369816054255053' 'lente']
['0.29534720251567403' '4' '1.4498125489966096' 'winter']
现在我知道如何在整个数组中找到最常见的项目了,就像这样
Counter(array.flat).most_common()
但是对于这个,我需要每个簇第 4 列中最常见的项目,即第二列中的值,除了制作一个长 for 循环并计算它们之外,还有更简单的方法吗?
出于某种原因,评论中建议的解决方案抛出 ValueError
。所以这是使用 pandas:
的替代解决方案
import pandas as pd
data = [] #A nested list for data shown in your question
df = pd.DataFrame(data, columns = ['val1','cluster', 'val2','season']) #read your data into a dataframe
def print_mode(group):
print("{} - {}".format(group['cluster'].values[0], group['season'].mode().values))
df.groupby('cluster').apply(print_mode)
示例数据的示例输出为:
0 - ['lente' 'winter']
1 - ['winter']
2 - ['herfst' 'lente']
3 - ['lente']
4 - ['lente']
无需打印它,您可以根据您的用例随意使用它。
我正在使用 kMeans AI 来确定任何一天的季节 为此,我有一个包含 4 列数据的数组 这是它的样子(虽然它更长):
['0.2742330338168506' '0' '1.3694492732480696' 'winter']
['0.28529288153011745' '0' '1.3805091209613365' 'lente']
['0.28595917620794253' '1' '1.3811754156391616' 'winter']
['0.2874392369724381' '2' '1.3826554764036572' 'lente']
['0.316557712713994' '2' '1.411773952145213' 'herfst']
['0.32113534393276466' '3 '1.4163515833639837' 'lente']
['0.3231108855082745' '3' '1.4220488660040091' 'lente']
['0.3163219663513872' '3' '1.4288377851608964' 'winter']
['0.31201423701381703' '4' '1.4331455144984666' 'lente']
['0.3081781460867783' '4' '1.4369816054255053' 'lente']
['0.29534720251567403' '4' '1.4498125489966096' 'winter']
现在我知道如何在整个数组中找到最常见的项目了,就像这样
Counter(array.flat).most_common()
但是对于这个,我需要每个簇第 4 列中最常见的项目,即第二列中的值,除了制作一个长 for 循环并计算它们之外,还有更简单的方法吗?
出于某种原因,评论中建议的解决方案抛出 ValueError
。所以这是使用 pandas:
import pandas as pd
data = [] #A nested list for data shown in your question
df = pd.DataFrame(data, columns = ['val1','cluster', 'val2','season']) #read your data into a dataframe
def print_mode(group):
print("{} - {}".format(group['cluster'].values[0], group['season'].mode().values))
df.groupby('cluster').apply(print_mode)
示例数据的示例输出为:
0 - ['lente' 'winter']
1 - ['winter']
2 - ['herfst' 'lente']
3 - ['lente']
4 - ['lente']
无需打印它,您可以根据您的用例随意使用它。