编程 Python

基于随机梯度下降的矩阵分解推荐算法（python）

Posted in Python onAugust 31, 2018

SVD是矩阵分解常用的方法，其原理为：矩阵M可以写成矩阵A、B与C相乘得到，而B可以与A或者C合并，就变成了两个元素M1与M2的矩阵相乘可以得到M。

矩阵分解推荐的思想就是基于此，将每个user和item的内在feature构成的矩阵分别表示为M1与M2，则内在feature的乘积得到M；因此我们可以利用已有数据（user对item的打分）通过随机梯度下降的方法计算出现有user和item最可能的feature对应到的M1与M2（相当于得到每个user和每个item的内在属性），这样就可以得到通过feature之间的内积得到user没有打过分的item的分数。

本文所采用的数据是movielens中的数据，且自行切割成了train和test，但是由于数据量较大，没有用到全部数据。

代码如下：

# -*- coding: utf-8 -*-
"""
Created on Mon Oct 9 19:33:00 2017
@author: wjw
"""
import pandas as pd
import numpy as np
import os
 
def difference(left,right,on): #求两个dataframe的差集
 df = pd.merge(left,right,how='left',on=on) #参数on指的是用于连接的列索引名称
 left_columns = left.columns
 col_y = df.columns[-1] # 得到最后一列
 df = df[df[col_y].isnull()]#得到boolean的list
 df = df.iloc[:,0:left_columns.size]#得到的数据里面还有其他同列名的column
 df.columns = left_columns # 重新定义columns
 return df
 
def readfile(filepath): #读取文件，同时得到训练集和测试集
 
 pwd = os.getcwd()#返回当前工程的工作目录
 os.chdir(os.path.dirname(filepath))
 #os.path.dirname()获得filepath文件的目录；chdir()切换到filepath目录下
 initialData = pd.read_csv(os.path.basename(filepath))
 #basename()获取指定目录的相对路径
 os.chdir(pwd)#回到先前工作目录下
 predData = initialData.iloc[:,0:3] #将最后一列数据去掉
 newIndexData = predData.drop_duplicates()
 trainData = newIndexData.sample(axis=0,frac = 0.1) #90%的数据作为训练集
 testData = difference(newIndexData,trainData,['userId','movieId']).sample(axis=0,frac=0.1)
 return trainData,testData
 
def getmodel(train):
 slowRate = 0.99
 preRmse = 10000000.0
 max_iter = 100
 features = 3
 lamda = 0.2
 gama = 0.01 #随机梯度下降中加入，防止更新过度
 user = pd.DataFrame(train.userId.drop_duplicates(),columns=['userId']).reset_index(drop=True) #把在原来dataFrame中的索引重新设置，drop=True并抛弃
 
 movie = pd.DataFrame(train.movieId.drop_duplicates(),columns=['movieId']).reset_index(drop=True)
 userNum = user.count().loc['userId'] #671
 movieNum = movie.count().loc['movieId'] 
 userFeatures = np.random.rand(userNum,features) #构造user和movie的特征向量集合
 movieFeatures = np.random.rand(movieNum,features)
 #假设每个user和每个movie有3个feature
 userFeaturesFrame =user.join(pd.DataFrame(userFeatures,columns = ['f1','f2','f3']))
 movieFeaturesFrame =movie.join(pd.DataFrame(movieFeatures,columns= ['f1','f2','f3']))
 userFeaturesFrame = userFeaturesFrame.set_index('userId')
 movieFeaturesFrame = movieFeaturesFrame.set_index('movieId') #重新设置index
 
 for i in range(max_iter): 
  rmse = 0
  n = 0
  for index,row in user.iterrows():
   uId = row.userId
   userFeature = userFeaturesFrame.loc[uId] #得到userFeatureFrame中对应uId的feature
 
   u_m = train[train['userId'] == uId] #找到在train中userId点评过的movieId的data
   for index,row in u_m.iterrows(): 
    u_mId = int(row.movieId)
    realRating = row.rating
    movieFeature = movieFeaturesFrame.loc[u_mId] 
 
    eui = realRating-np.dot(userFeature,movieFeature)
    rmse += pow(eui,2)
    n += 1
    userFeaturesFrame.loc[uId] += gama * (eui*movieFeature-lamda*userFeature) 
    movieFeaturesFrame.loc[u_mId] += gama*(eui*userFeature-lamda*movieFeature)
  nowRmse = np.sqrt(rmse*1.0/n)
  print('step:%f,rmse:%f'%((i+1),nowRmse))
  if nowRmse<preRmse:
   preRmse = nowRmse
  elif nowRmse<0.5:
   break
  elif nowRmse-preRmse<=0.001:
   break
  gama*=slowRate
 return userFeaturesFrame,movieFeaturesFrame
 
def evaluate(userFeaturesFrame,movieFeaturesFrame,test):
 test['predictRating']='NAN' # 新增一列
 
 for index,row in test.iterrows(): 
  
  print(index)
  userId = row.userId
  movieId = row.movieId
  if userId not in userFeaturesFrame.index or movieId not in movieFeaturesFrame.index:
   continue
  userFeature = userFeaturesFrame.loc[userId]
  movieFeature = movieFeaturesFrame.loc[movieId]
  test.loc[index,'predictRating'] = np.dot(userFeature,movieFeature) #不定位到不能修改值
  
 return test 
 
if __name__ == "__main__":
 filepath = r"E:\学习\研究生\推荐系统\ml-latest-small\ratings.csv"
 train,test = readfile(filepath)
 userFeaturesFrame,movieFeaturesFrame = getmodel(train)
 result = evaluate(userFeaturesFrame,movieFeaturesFrame,test)

在test中得到的结果为：

基于随机梯度下降的矩阵分解推荐算法（python）

NAN则是训练集中没有的数据

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持三水点靠木。

基于随机梯度下降的矩阵分解推荐算法（python）

- Author -

ge_nius

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

简单介绍Python中的JSON使用

Apr 28 Python

Python 多进程并发操作中进程池Pool的实例

Nov 01 Python

解决安装pycharm后不能执行python脚本的问题

Jan 19 Python

Python2.7版os.path.isdir中文路径返回false的解决方法

Jun 21 Python

详解Python 多线程 Timer定时器/延迟执行、Event事件

Jun 27 Python

python绘制直方图和密度图的实例

Jul 08 Python

简单介绍python封装的基本知识

Aug 10 Python

python中JWT用户认证的实现

May 18 Python

Python基于time模块表示时间常用方法

Jun 18 Python

PyCharm vs VSCode，作为python开发者，你更倾向哪种IDE呢？

Aug 17 Python

python脚本定时发送邮件

Dec 22 Python

python字符串的多行输出的实例详解

Jun 08 Python

python实现梯度下降算法

Mar 24 #Python

wtfPython—Python中一组有趣微妙的代码【收藏】

Aug 31 #Python

opencv python 图像去噪的实现方法

Aug 31 #Python

python+numpy+matplotalib实现梯度下降法

Aug 31 #Python

python实现随机梯度下降法

Mar 24 #Python

python实现决策树分类（2）

Aug 30 #Python

python实现决策树分类

Aug 30 #Python

You might like

PHP批量删除、清除UTF-8文件BOM头的代码实例

2014/04/14 PHP

php类的扩展和继承用法实例

2015/06/20 PHP

php自定义分页类完整实例

2015/12/25 PHP

ThinkPHP实现附件上传功能

2017/04/27 PHP

JavaScript创建一个欢迎cookie弹出窗实现代码

2013/03/15 Javascript

setTimeout和setInterval的深入理解

2013/11/08 Javascript

js实现鼠标滑过文字链接色彩变化的效果

2015/05/06 Javascript

javascript中FOREACH数组方法使用示例

2016/03/01 Javascript

jQuery+json实现动态创建复杂表格table的方法

2016/10/25 Javascript

javascript入门之window对象【新手必看】

2016/11/22 Javascript

Vue-cli proxyTable 解决开发环境的跨域问题详解

2017/05/18 Javascript

jQuery实现的简单无刷新评论功能示例

2017/11/08 jQuery

vue.js默认路由不加载linkActiveClass问题的解决方法

2017/12/11 Javascript

vue + vuex todolist的实现示例代码

2018/03/09 Javascript

解决vue数组中对象属性变化页面不渲染问题

2018/08/09 Javascript

vue移动端使用appClound拉起支付宝支付的实现方法

2019/11/21 Javascript

给Python的Django框架下搭建的BLOG添加RSS功能的教程

2015/04/08 Python

Python中几种操作字符串的方法的介绍

2015/04/09 Python

wxPython定时器wx.Timer简单应用实例

2015/06/03 Python

Kali Linux安装ipython2 和 ipython3的方法

2019/07/11 Python

基于keras输出中间层结果的2种实现方式

2020/01/24 Python

Html5移动端弹幕动画实现示例代码

2018/08/27 HTML / CSS

Skyscanner台湾：全球知名的旅行比价引擎

2018/07/01 全球购物

北京麒麟网信息技术有限公司网络游戏测试面试题

2013/09/28 面试题

中专毕业自我鉴定

2013/10/16 职场文书

怎么写有吸引力的自荐信

2013/11/17 职场文书

市场营销求职信范文

2014/02/21 职场文书

10的分与合教学反思

2014/04/30 职场文书

商场开业庆典策划方案

2014/06/02 职场文书

2014年政教处工作总结

2014/12/20 职场文书

给女朋友道歉的话大全

2015/01/20 职场文书

大学生求职自荐信

2015/03/24 职场文书

格列佛游记读书笔记

2015/06/30 职场文书

mybatis调用sqlserver存储过程返回结果集的方法

2021/05/08 SQL Server

Python词云的正确实现方法实例

2021/05/08 Python

vue中this.$http.post()跨域和请求参数丢失的解决

2022/04/08 Vue.js