K-Means 按组分类
K-Means classification by group
我正在尝试在这样的数据框中进行 K 均值分析:
URBAN AREA PROVINCE DENSITY
0 1 TRUJILLO 0.30
1 2 TRUJILLO 0.03
2 3 TRUJILLO 0.80
3 1 LIMA 1.20
4 2 LIMA 0.04
5 1 LAMBAYEQUE 0.90
6 2 LAMBAYEQUE 0.10
7 3 LAMBAYEQUE 0.08
(可以从here下载)
如你所见,df指的是省内不同的城市区域(具有不同的城市密度值)。所以,我想通过一列 DENSITY 进行 K 均值分类。为此,我执行以下代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
df=pd.read_csv('C:/Path/to/example.csv')
clustering=KMeans(n_clusters=2, max_iter=300)
clustering.fit(df[['DENSITY']])
df['KMeans_Clusters']=clustering.labels_
df
我得到了这个结果,这对于示例的第一部分来说是可以的:
URBAN AREA PROVINCE DENSITY KMeans_Clusters
0 1 TRUJILLO 0.30 0
1 2 TRUJILLO 0.03 0
2 3 TRUJILLO 0.80 1
3 1 LIMA 1.20 1
4 2 LIMA 0.04 0
5 1 LAMBAYEQUE 0.90 1
6 2 LAMBAYEQUE 0.10 0
7 3 LAMBAYEQUE 0.08 0
但是我现在想做分省市区的k-means分类。我的意思是,在任何省份重复同样的过程。所以我试过这个代码:
df=pd.read_csv('C:/Users/rojas/Desktop/example.csv')
clustering=KMeans(n_clusters=2, max_iter=300)
clustering.fit(df[['DENSITY']]).groupby('PROVINCE')
df['KMeans_Clusters']=clustering.labels_
df
但我收到这条消息:
AttributeError Traceback (most recent call last)
<ipython-input-4-87e7696ff61a> in <module>
3 clustering=KMeans(n_clusters=2, max_iter=300)
4
----> 5 clustering.fit(df[['DENSITY']]).groupby('PROVINCE')
6
7 df['KMeans_Clusters']=clustering.labels_
AttributeError: 'KMeans' object has no attribute 'groupby'
有办法吗?
试试这个
def k_means(row):
clustering=KMeans(n_clusters=2, max_iter=300)
model = clustering.fit(row[['DENSITY']])
row['KMeans_Clusters'] = model.labels_
return row
df = df.groupby('PROVINCE').apply(k_means)
结果
URBAN AREA PROVINCE DENSITY KMeans_Clusters
0 0 1 TRUJILLO 0.30 0
1 1 2 TRUJILLO 0.03 0
2 2 3 TRUJILLO 0.80 1
3 3 1 LIMA 1.20 1
4 4 2 LIMA 0.04 0
5 5 1 LAMBAYEQUE 0.90 0
6 6 2 LAMBAYEQUE 0.10 1
7 7 3 LAMBAYEQUE 0.08 1
我正在尝试在这样的数据框中进行 K 均值分析:
URBAN AREA PROVINCE DENSITY
0 1 TRUJILLO 0.30
1 2 TRUJILLO 0.03
2 3 TRUJILLO 0.80
3 1 LIMA 1.20
4 2 LIMA 0.04
5 1 LAMBAYEQUE 0.90
6 2 LAMBAYEQUE 0.10
7 3 LAMBAYEQUE 0.08
(可以从here下载)
如你所见,df指的是省内不同的城市区域(具有不同的城市密度值)。所以,我想通过一列 DENSITY 进行 K 均值分类。为此,我执行以下代码:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
df=pd.read_csv('C:/Path/to/example.csv')
clustering=KMeans(n_clusters=2, max_iter=300)
clustering.fit(df[['DENSITY']])
df['KMeans_Clusters']=clustering.labels_
df
我得到了这个结果,这对于示例的第一部分来说是可以的:
URBAN AREA PROVINCE DENSITY KMeans_Clusters
0 1 TRUJILLO 0.30 0
1 2 TRUJILLO 0.03 0
2 3 TRUJILLO 0.80 1
3 1 LIMA 1.20 1
4 2 LIMA 0.04 0
5 1 LAMBAYEQUE 0.90 1
6 2 LAMBAYEQUE 0.10 0
7 3 LAMBAYEQUE 0.08 0
但是我现在想做分省市区的k-means分类。我的意思是,在任何省份重复同样的过程。所以我试过这个代码:
df=pd.read_csv('C:/Users/rojas/Desktop/example.csv')
clustering=KMeans(n_clusters=2, max_iter=300)
clustering.fit(df[['DENSITY']]).groupby('PROVINCE')
df['KMeans_Clusters']=clustering.labels_
df
但我收到这条消息:
AttributeError Traceback (most recent call last)
<ipython-input-4-87e7696ff61a> in <module>
3 clustering=KMeans(n_clusters=2, max_iter=300)
4
----> 5 clustering.fit(df[['DENSITY']]).groupby('PROVINCE')
6
7 df['KMeans_Clusters']=clustering.labels_
AttributeError: 'KMeans' object has no attribute 'groupby'
有办法吗?
试试这个
def k_means(row):
clustering=KMeans(n_clusters=2, max_iter=300)
model = clustering.fit(row[['DENSITY']])
row['KMeans_Clusters'] = model.labels_
return row
df = df.groupby('PROVINCE').apply(k_means)
结果
URBAN AREA PROVINCE DENSITY KMeans_Clusters
0 0 1 TRUJILLO 0.30 0
1 1 2 TRUJILLO 0.03 0
2 2 3 TRUJILLO 0.80 1
3 3 1 LIMA 1.20 1
4 4 2 LIMA 0.04 0
5 5 1 LAMBAYEQUE 0.90 0
6 6 2 LAMBAYEQUE 0.10 1
7 7 3 LAMBAYEQUE 0.08 1