老虎机大奖声音『网址:ff00.co』F2F4Y5L8-  S  F  I

  1. 首页
  2. 问答社区

买足球彩票哪个软件好『网址:ff00.co』F2F4Y5L8-  W  P  M

Python推荐系统库–Surprise实战

xsmile 发布于 3个月前 分类:机器学习

from Python推荐系统库--Surprise实战 - 墨麟非攻 - 博客园 (cnblogs.com)

一、使用movieLens数据集

from surprise import KNNBasic, SVD
from surprise import Dataset
from surprise import evaluate, print_perf
# 使用公开的推荐系统数据集--MovieLens
data = Dataset.load_builtin('ml-100k')
# k 折交叉验证
data.split(n_folds=3)
# 算法使用SVD分解
algo = SVD()
# 在数据集上测试效果,算出最小均方根误差、平均绝对误差
perf = evaluate(algo, data, measures=['RMSE', 'MAE'])
# 输出结果
print_perf(perf)
Evaluating RMSE, MAE of algorithm SVD.

------------
Fold 1
RMSE: 0.9506
MAE:  0.7511
------------
Fold 2
RMSE: 0.9452
MAE:  0.7456
------------
Fold 3
RMSE: 0.9442
MAE:  0.7444
------------
------------
Mean RMSE: 0.9467
Mean MAE : 0.7470
------------
------------
        Fold 1  Fold 2  Fold 3  Mean    
RMSE    0.9506  0.9452  0.9442  0.9467  
MAE     0.7511  0.7456  0.7444  0.7470

二、算法调参

我们使用sklearn常用到的网格搜索交叉验证(GridSearchCV)来选择最优的参数

# 算法调参
from surprise import GridSearch
# 迭代轮次、学习率、
# 三个参数,每个有两个参数,2^3 = 8种可能
param_grid = {'n_epochs':[5, 10], 'lr_all':[0.002, 0.005],
             'reg_all':[0.4, 0.6]}

# 使用SVD算法,三个参数参与调参,评估标准使用最小均方根误差、协调对分数
grid_search = GridSearch(SVD, param_grid, measures=['RMSE', 'FCP'])
data = Dataset.load_builtin('ml-100k')
data.split(n_folds=3)

grid_search.evaluate(data)
Running grid search for the following parameter combinations:
{'n_epochs': 5, 'lr_all': 0.002, 'reg_all': 0.4}
{'n_epochs': 5, 'lr_all': 0.002, 'reg_all': 0.6}
{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.4}
{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.6}
{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.4}
{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.6}
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.6}

Resulsts:
{'n_epochs': 5, 'lr_all': 0.002, 'reg_all': 0.4}
{'RMSE': 0.9973640543212537, 'FCP': 0.6834505918617332}
----------
{'n_epochs': 5, 'lr_all': 0.002, 'reg_all': 0.6}
{'RMSE': 1.0033367804212159, 'FCP': 0.6863671726311678}
----------
{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.4}
{'RMSE': 0.9740022047005671, 'FCP': 0.693822773157699}
----------
{'n_epochs': 5, 'lr_all': 0.005, 'reg_all': 0.6}
{'RMSE': 0.9828360526820644, 'FCP': 0.6939377853330241}
----------
{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.4}
{'RMSE': 0.9783154591562983, 'FCP': 0.6919014896389958}
----------
{'n_epochs': 10, 'lr_all': 0.002, 'reg_all': 0.6}
{'RMSE': 0.9863470326305794, 'FCP': 0.6925580320424597}
----------
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}
{'RMSE': 0.9641597864074152, 'FCP': 0.6973875277009212}
----------
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.6}
{'RMSE': 0.9740231673256359, 'FCP': 0.6976928768968366}
# 输出最优的参数组
# 输出最好的RMSE结果
print(grid_search.best_score['RMSE'])

# 输出对应最好的RMSE结果的参数
print(grid_search.best_params['RMSE'])
0.9641597864074152
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}
# 最好的FCP得分
print(grid_search.best_score['FCP'])

# 输出对应最好的FCP结果的参数
print(grid_search.best_params['FCP'])
0.6983253171588012
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.6}

在自己的数据集上训练模型

该如何做?

1. 载入自己的数据集

import os
from surprise import Reader, Dataset
# 指定文件路径
file_path = os.path.expanduser('./popular_music_suprise_format.txt')
# 指定文件格式
reader = Reader(line_format='user item rating timestamp', sep=',')
# 从文件读取数据
music_data = Dataset.load_from_file(file_path, reader=reader)
# 分成5折
music_data.split(n_folds=5)

2. 使用不同的推荐算法进行建模比较

### 使用NormalPredictor
from surprise import NormalPredictor, evaluate
algo = NormalPredictor()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])

### 使用BaselineOnly
from surprise import BaselineOnly, evaluate
algo = BaselineOnly()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])

### 使用基础版协同过滤
from surprise import KNNBasic, evaluate
algo = KNNBasic()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])

### 使用均值协同过滤
from surprise import KNNWithMeans, evaluate
algo = KNNWithMeans()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])

### 使用协同过滤baseline
from surprise import KNNBaseline, evaluate
algo = KNNBaseline()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])

### 使用SVD
from surprise import SVD, evaluate
algo = SVD()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])

### 使用SVD++
from surprise import SVDpp, evaluate
algo = SVDpp()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])

### 使用NMF
from surprise import NMF
algo = NMF()
perf = evaluate(algo, music_data, measures=['RMSE', 'MAE'])
print_perf(perf)

推荐系统--不同电影之间的相似度

一、载入数据,使用算法算出相互间的相似度

# 在协同过滤算法建模以后,根据item取回相似度最高的item
# 使用的是 algo.get_neighbors()

from __future__ import (absolute_import, division, print_function, unicode_literals)
import os
import io

from surprise import KNNBaseline
from surprise import Dataset
# 获取电影名到电影id 和 电影id到电影名的映射
def read_item_names():
    file_name = (os.path.expanduser('~') + '/.surprise_data/ml-100k/ml-100k/u.item')
    rid_to_name = {}
    name_to_rid = {}
    with io.open(file_name, 'r', encoding='ISO-8859-1') as f:
        for line in f:
            line = line.split('|')
            rid_to_name[line[0]] = line[1]
            name_to_rid[line[1]] = line[0]
    return rid_to_name, name_to_rid

# 用算法计算相互间的相似度
data = Dataset.load_builtin('ml-100k')
trainest = data.build_full_trainset()
sim_options = {'name': 'pearson_baseline', 'user_based': False}
algo = KNNBaseline(sim_options=sim_options)
algo.train(trainest)
Estimating biases using als...
Computing the pearson_baseline similarity matrix...
Done computing similarity matrix.
# 获取电影名到电影id 和 电影id到电影名的映射
rid_to_name, name_to_rid = read_item_names()

# 获取玩具总动员的内部id
toy_story_raw_id = name_to_rid['Toy Story (1995)']
toy_story_raw_id
'1'
toy_story_inner_id = algo.trainset.to_inner_iid(toy_story_raw_id)
toy_story_inner_id
24
toy_story_neighbors = algo.get_neighbors(toy_story_inner_id, k=10)
toy_story_neighbors
[433, 101, 302, 309, 971, 95, 26, 561, 816, 347]

二、获取相似度最近的10部电影

# 将邻居的内部id转换为名称。
toy_story_neighbors = (algo.trainset.to_raw_iid(inner_id) for inner_id in toy_story_neighbors)

toy_story_neighbors = (rid_to_name[rid] for rid in toy_story_neighbors)

print()
print('The 10 nearest neighbors of Toy Story are:')
for movie in toy_story_neighbors:
    print(movie)
The 10 nearest neighbors of Toy Story are:
Beauty and the Beast (1991)
Raiders of the Lost Ark (1981)
That Thing You Do! (1996)
Lion King, The (1994)
Craft, The (1996)
Liar Liar (1997)
Aladdin (1992)
Cool Hand Luke (1967)
Winnie the Pooh and the Blustery Day (1968)
Indiana Jones and the Last Crusade (1989)

参考文章:https://blog.csdn.net/mycafe_/article/details/79146764

0个回复

  • 暂无回复

联系我们

在线咨询:

邮件:[email protected]

汽车换电瓶上门服务 昆山电脑小姐上门服务 嫖娼钓鱼案例 上海外围女 约炮微信群二维码6月
成都按摩保健上门快餐 广州上门服务援交 怎么用软件找附近的小姐 女士大保健找男技师 运城上门服务洗车电话
疏通下水管道50元上门服务 保险经理上门服务大款在线 绿植养护上门服务 陌陌约炮成功案例 黄冈东源大酒店大保健
杭州富阳上门按摩推拿 南京亲子鉴定上门服务 热水器清洗一条龙服务 上门服务手机维小姐上门价格 容桂有上门按摩服务吗