LightFM 如何为新用户进行预测(冷启动)——用户 ID 8 不在用户 ID 映射中
LightFM how to make predictions for new users (cold start) - user id 8 not in user id mappings
我正在构建一个推荐系统,以便根据用户特征和项目特征向员工推荐培训,根据文档,LightFM 是一个很棒的算法。
我的用户数据框:
User-Id name age los ou gender skills
0 1 Luis 21 IFS architecture M python
1 2 Peter 22 ADV pmo M pm
2 3 Jurgen 23 IFS architecture M sql
3 4 Bart 24 IFS architecture M python
4 5 Cristina 25 ADV pmo F pm
5 6 Lambert 33 IFS development M sql
6 7 Rahul 44 IFS development M python
我的trainingds数据框
Training-Id training name main skill
0 1 basic python python
1 2 advanced python python
2 3 basic scrum pm
3 4 advanced scrum pm
4 5 basic sql sql
5 6 advanced sql sql
我接受过的培训数据框(10 表示用户接受过该培训)
所以我的权重只有 10s
User-Id Training-Id TrainingTaken
0 1 1 10
1 1 2 10
2 2 3 10
3 2 4 10
4 3 5 10
5 3 6 10
6 4 1 10
7 4 2 10
我找到了这个创建矩阵的好帮手:
https://github.com/Med-ELOMARI/LightFM-Dataset-Helper
所以:
items_column = "Training-Id"
user_column = "User-Id"
ratings_column = "TrainingTaken"
items_feature_columns = [
"training name",
"main skill"
]
user_features_columns = ["name","age","los","ou", "gender", "skills"]
dataset_helper_instance = DatasetHelper(
users_dataframe=usersdf,
items_dataframe=trainingsdf,
interactions_dataframe=trainingstakendf,
item_id_column=items_column,
items_feature_columns=items_feature_columns,
user_id_column=user_column,
user_features_columns=user_features_columns,
interaction_column=ratings_column,
clean_unknown_interactions=True,
)
# run the routine
# you can alslo run the steps separately one by one | routine function is simplifying the flow
dataset_helper_instance.routine()
上面的helper returns交互矩阵,权重矩阵等
dataset_helper_instance.weights.todense()
Output menu
matrix([[10., 10., 0., 0., 0., 0.],
[ 0., 0., 10., 10., 0., 0.],
[ 0., 0., 0., 0., 10., 10.],
[10., 10., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]], dtype=float32)
dataset_helper_instance.interactions.todense()
matrix([[1., 1., 0., 0., 0., 0.],
[0., 0., 1., 1., 0., 0.],
[0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]], dtype=float32)
然后我训练测试拆分并拟合模型
from lightfm import LightFM
from lightfm.cross_validation import random_train_test_split
(train, test) = random_train_test_split(interactions=dataset_helper_instance.interactions, test_percentage=0.2)
model = LightFM(loss='warp')
model.fit(
interactions=dataset_helper_instance.interactions,
sample_weight=dataset_helper_instance.weights,
item_features=dataset_helper_instance.item_features_list,
user_features=dataset_helper_instance.user_features_list,
verbose=True,
epochs=50,
num_threads=20,
)
然后我检查AUC和精度:
from lightfm.evaluation import precision_at_k
from lightfm.evaluation import auc_score
train_precision = precision_at_k(model, train,item_features=dataset_helper_instance.item_features_list, user_features=dataset_helper_instance.user_features_list , k=10).mean()
test_precision = precision_at_k(model, test, item_features=dataset_helper_instance.item_features_list, user_features=dataset_helper_instance.user_features_list,k=10).mean()
train_auc = auc_score(model, train,item_features=dataset_helper_instance.item_features_list, user_features=dataset_helper_instance.user_features_list).mean()
test_auc = auc_score(model, test,item_features=dataset_helper_instance.item_features_list, user_features=dataset_helper_instance.user_features_list).mean()
print('Precision: train %.2f, test %.2f. '% (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))
Precision: train 0.15, test 0.10.
AUC: train 0.90, test 1.00.
然后我为现有用户做预测
scores = model.predict(user_ids=6, item_ids=[1,2,3,5,6])
print(scores)
[ 0.01860116 -0.20987387 0.06134995 0.08332028 0.13678455]
太好了,我可以得到一些针对用户 ID 6 的训练预测。
现在我要为新用户预测,(冷启动)
我尝试了以下方法:
dataset = Dataset()
new_user_feature = [8,{'name:John', 'Age:33', 'los:IFS','ou:development', 'skills:sql'} ]
new_user_feature = [8,new_user_feature]
new_user_feature = dataset.build_user_features([new_user_feature])
#predict new users User-Id name age los ou gender skills
model.predict(0, item_ids=[1,2,3,5,6], user_features=new_user_feature)
但是我得到这个错误:
ValueError: user id 8 not in user id mappings.
我在这里错过了什么?
我无法测试,但我认为问题出在你写的时候:
new_user_feature = [8,{'name:John', 'Age:33', 'los:IFS','ou:development', 'skills:sql'} ]
new_user_feature = [8,new_user_feature]
根据文档,dataset.build_user_features(..)
想要一个 (user id, [list of feature names])
或 (user id, {feature name: feature weight})
.
形式的可迭代对象
在你的情况下,我认为你应该将上面的两行替换为:
new_user_feature = [8,{'name':'John', 'Age':33, 'los':'IFS','ou':'development', 'skills':'sql'} ]
# Is the gender missing?
如果不行,输入格式可能是这样的:
new_user_feature = [8,['John', 33, 'IFS', 'development', 'sql'] ]
如果问题解决了请告诉我
我正在构建一个推荐系统,以便根据用户特征和项目特征向员工推荐培训,根据文档,LightFM 是一个很棒的算法。
我的用户数据框:
User-Id name age los ou gender skills
0 1 Luis 21 IFS architecture M python
1 2 Peter 22 ADV pmo M pm
2 3 Jurgen 23 IFS architecture M sql
3 4 Bart 24 IFS architecture M python
4 5 Cristina 25 ADV pmo F pm
5 6 Lambert 33 IFS development M sql
6 7 Rahul 44 IFS development M python
我的trainingds数据框
Training-Id training name main skill
0 1 basic python python
1 2 advanced python python
2 3 basic scrum pm
3 4 advanced scrum pm
4 5 basic sql sql
5 6 advanced sql sql
我接受过的培训数据框(10 表示用户接受过该培训) 所以我的权重只有 10s
User-Id Training-Id TrainingTaken
0 1 1 10
1 1 2 10
2 2 3 10
3 2 4 10
4 3 5 10
5 3 6 10
6 4 1 10
7 4 2 10
我找到了这个创建矩阵的好帮手: https://github.com/Med-ELOMARI/LightFM-Dataset-Helper
所以:
items_column = "Training-Id"
user_column = "User-Id"
ratings_column = "TrainingTaken"
items_feature_columns = [
"training name",
"main skill"
]
user_features_columns = ["name","age","los","ou", "gender", "skills"]
dataset_helper_instance = DatasetHelper(
users_dataframe=usersdf,
items_dataframe=trainingsdf,
interactions_dataframe=trainingstakendf,
item_id_column=items_column,
items_feature_columns=items_feature_columns,
user_id_column=user_column,
user_features_columns=user_features_columns,
interaction_column=ratings_column,
clean_unknown_interactions=True,
)
# run the routine
# you can alslo run the steps separately one by one | routine function is simplifying the flow
dataset_helper_instance.routine()
上面的helper returns交互矩阵,权重矩阵等
dataset_helper_instance.weights.todense()
Output menu
matrix([[10., 10., 0., 0., 0., 0.],
[ 0., 0., 10., 10., 0., 0.],
[ 0., 0., 0., 0., 10., 10.],
[10., 10., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]], dtype=float32)
dataset_helper_instance.interactions.todense()
matrix([[1., 1., 0., 0., 0., 0.],
[0., 0., 1., 1., 0., 0.],
[0., 0., 0., 0., 1., 1.],
[1., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]], dtype=float32)
然后我训练测试拆分并拟合模型
from lightfm import LightFM
from lightfm.cross_validation import random_train_test_split
(train, test) = random_train_test_split(interactions=dataset_helper_instance.interactions, test_percentage=0.2)
model = LightFM(loss='warp')
model.fit(
interactions=dataset_helper_instance.interactions,
sample_weight=dataset_helper_instance.weights,
item_features=dataset_helper_instance.item_features_list,
user_features=dataset_helper_instance.user_features_list,
verbose=True,
epochs=50,
num_threads=20,
)
然后我检查AUC和精度:
from lightfm.evaluation import precision_at_k
from lightfm.evaluation import auc_score
train_precision = precision_at_k(model, train,item_features=dataset_helper_instance.item_features_list, user_features=dataset_helper_instance.user_features_list , k=10).mean()
test_precision = precision_at_k(model, test, item_features=dataset_helper_instance.item_features_list, user_features=dataset_helper_instance.user_features_list,k=10).mean()
train_auc = auc_score(model, train,item_features=dataset_helper_instance.item_features_list, user_features=dataset_helper_instance.user_features_list).mean()
test_auc = auc_score(model, test,item_features=dataset_helper_instance.item_features_list, user_features=dataset_helper_instance.user_features_list).mean()
print('Precision: train %.2f, test %.2f. '% (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))
Precision: train 0.15, test 0.10.
AUC: train 0.90, test 1.00.
然后我为现有用户做预测
scores = model.predict(user_ids=6, item_ids=[1,2,3,5,6])
print(scores)
[ 0.01860116 -0.20987387 0.06134995 0.08332028 0.13678455]
太好了,我可以得到一些针对用户 ID 6 的训练预测。
现在我要为新用户预测,(冷启动)
我尝试了以下方法:
dataset = Dataset()
new_user_feature = [8,{'name:John', 'Age:33', 'los:IFS','ou:development', 'skills:sql'} ]
new_user_feature = [8,new_user_feature]
new_user_feature = dataset.build_user_features([new_user_feature])
#predict new users User-Id name age los ou gender skills
model.predict(0, item_ids=[1,2,3,5,6], user_features=new_user_feature)
但是我得到这个错误:
ValueError: user id 8 not in user id mappings.
我在这里错过了什么?
我无法测试,但我认为问题出在你写的时候:
new_user_feature = [8,{'name:John', 'Age:33', 'los:IFS','ou:development', 'skills:sql'} ]
new_user_feature = [8,new_user_feature]
根据文档,dataset.build_user_features(..)
想要一个 (user id, [list of feature names])
或 (user id, {feature name: feature weight})
.
在你的情况下,我认为你应该将上面的两行替换为:
new_user_feature = [8,{'name':'John', 'Age':33, 'los':'IFS','ou':'development', 'skills':'sql'} ]
# Is the gender missing?
如果不行,输入格式可能是这样的:
new_user_feature = [8,['John', 33, 'IFS', 'development', 'sql'] ]
如果问题解决了请告诉我