编程 Python

Python实现的knn算法示例

Posted in Python onJune 14, 2018

本文实例讲述了Python实现的knn算法。分享给大家供大家参考，具体如下：

代码参考机器学习实战那本书：

有兴趣你们可以去了解下

具体代码：

# -*- coding:utf-8 -*-
#! python2
'''''
@author:zhoumeixu
createdate:2015年8月27日
'''
#np.zeros((4,2))
#np.zeros(8).reshape(4,2)
#x=np.array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]]) np.zeros_like(x)
# 最值和排序：最值有np.max(),np.min() 他们都有axis和out（输出）参数,
# 而通过np.argmax(), np.argmin()可以得到取得最大或最小值时的 下标。
# 排序通过np.sort(), 而np.argsort()得到的是排序后的数据原来位置的下标
# 简单实现knn算法的基本思路
import numpy as np
import operator #运算符操作包
from _ctypes import Array
from statsmodels.sandbox.regression.kernridgeregress_class import plt_closeall
def createDataSet():
 group=np.array([[1.0,1.1],[1.0,1.0],[0,0],[0,0.1]])
 labels=['A','A','B','B']
 return group ,labels
group,labels=createDataSet()
def classify0(inx,dataSet,labels,k):
 dataSetSize=dataSet.shape[0]
 diffMat=np.tile(inx,(dataSetSize,1))-dataSet
 sqDiffMat=diffMat**2
 sqDistances=sqDiffMat.sum(axis=1)
 distances=sqDistances**0.5   #计算距离 python中会自动广播的形式
 sortedDistIndicies=distances.argsort() #排序，得到原来数据的在原来所在的下标
 classCount={}
 for i in range(k):
  voteIlabel=labels[sortedDistIndicies[i]] # 计算距离最近的值所在label标签
  classCount[voteIlabel]=classCount.get(voteIlabel,0)+1 # 计算距离最近的值所在label标签，对前k哥最近数据进行累加
 sortedClassCount=sorted(classCount.iteritems(),key=operator.itemgetter(1),reverse=True) #排序得到距离k个最近的数所在的标签
 return sortedClassCount[0][0]
if __name__=='__main__':
 print(classify0([0,0],group,labels,4))
# 利用knn算法改进约会网站的配对效果
def file2matrix(filename):
 fr=open(filename)
 arrayOLines=fr.readlines()
 numberOfLines=len(arrayOLines)
 returnMat=np.zeros((numberOfLines,3))
 classLabelVector=[]
 index=0
 for line in arrayOLines:
  line=line.strip()
  listFromLine=line.split('\t')
  returnMat[index,:]=listFromLine[0:3]
  classLabelVector.append(int(listFromLine[-1]))
  index+=1
 return returnMat ,classLabelVector #生成训练数据的array和目标array
path=u'D:\\Users\\zhoumeixu204\\Desktop\\python语言机器学习\\机器学习实战代码 python\\机器学习实战代码\\machinelearninginaction\\Ch02\\'
datingDataMat,datingLabels=file2matrix(path+'datingTestSet2.txt')
import matplotlib
import matplotlib.pyplot as plt
fig=plt.figure()
ax=fig.add_subplot(111)
ax.scatter(datingDataMat[:,1],datingDataMat[:,2])
plt.show()
ax.scatter(datingDataMat[:,1],datingDataMat[:,2],15.0*np.array(datingLabels),15*np.array(datingDataMat[:,2]))
plt.show()  #生成训练数据的array和目标array
def autoNorm(dataset):
 minVals=dataset.min(0)
 maxVals=dataset.max(0)
 ranges=maxVals-minVals
 normeDataSet=np.zeros(np.shape(dataset))
 m=dataset.shape[0]
 normDataSet=dataset-np.tile(minVals,(m,1))
 normDataSet=normDataSet/np.tile(ranges,(m,1))
 return normDataSet ,ranges,minVals
normMat,ranges,minVals=autoNorm(datingDataMat)
def datingClassTest():
 hoRatio=0.1
 datingDataMat,datingLabels=file2matrix(path+'datingTestSet2.txt')
 normMat,ranges,minVals=autoNorm(datingDataMat)
 m=normMat.shape[0]
 numTestVecs=int(m*hoRatio)
 errorCount=0.0
 for i in range(numTestVecs):
  classifierResult=classify0(normMat[i,:], normMat[numTestVecs:m,:], datingLabels[numTestVecs:m],3)
  print "the classifier came back with :%d,the real answer is :%d"\
     %(classifierResult,datingLabels[i])
  if classifierResult!=datingLabels[i]:
   errorCount+=1.0
 print "the total error rare is :%f"%(errorCount/float(numTestVecs)) #利用knn算法测试错误率
if __name__=='__main__':
 datingClassTest()
#利用构建好的模型进行预测
def classifyPerson():
 resultList=['not at all','in same doses','in large d oses']
 percentTats=float(raw_input("percentage if time spent playin cideo games:"))
 ffMiles=float(raw_input("frequnet fliter miles earned per year:"))
 iceCream=float(raw_input("liters of ice cream consumed per year:"))
 datingDataMat,datingLabels=file2matrix(path+'datingTestSet2.txt')
 normMat,ranges,minVals=autoNorm(datingDataMat)
 inArr=np.array([ffMiles,percentTats,iceCream])
 classifierResult=classify0((inArr-minVals)/ranges,normMat,datingLabels,3)
 print("you will probably like the person:",resultList[classifierResult-1])
if __name__!='__main__':
 classifyPerson()
#利用knn算法进行手写识别系统验证
path=u'D:\\Users\\zhoumeixu204\\Desktop\\python语言机器学习\\机器学习实战代码 python\\机器学习实战代码\\machinelearninginaction\\Ch02\\'
def img2vector(filename):
 returnVect=np.zeros((1,1024))
 fr=open(filename)
 for i in range(32):
  lineStr=fr.readline()
  for j in range(32):
   returnVect[0,32*i+j]=int(lineStr[j])
 return returnVect
testVector=img2vector(path+'testDigits\\0_13.txt')
print(testVector[0,0:31])
import os
def handwritingClassTest():
 hwLabels=[]
 trainingFileList=os.listdir(path+'trainingDigits')
 m=len(trainingFileList)
 trainingMat=np.zeros((m,1024))
 for i in range(m):
  fileNameStr=trainingFileList[i]
  fileStr=fileNameStr.split('.')[0]
  classNumStr=int(fileStr.split('_')[0])
  hwLabels.append(classNumStr)
  trainingMat[i,:]=img2vector(path+'trainingDigits\\'+fileNameStr)
 testFileList=os.listdir(path+'testDigits')
 errorCount=0.0
 mTest=len(testFileList)
 for j in range(mTest):
  fileNameStr=testFileList[j]
  fileStr=fileNameStr.split('.')[0]
  classNumStr=int(fileNameStr.split('_')[0])
  classNumStr=int(fileStr.split('_')[0])
  vectorUnderTest=img2vector(path+'testDigits\\'+fileNameStr)
  classifierResult=classify0(vectorUnderTest,trainingMat,hwLabels,3)
  print("the classifier canme back with:%d,the real answer is :%d"%(classifierResult,classNumStr))
  if classifierResult!=classNumStr:
   errorCount+=1.0
 print("\nthe total number of errors is :%d"%errorCount)
 print("\n the total error rate is :%f"%(errorCount/float(mTest)))
if __name__=='__main__':
 handwritingClassTest()

运行结果如下图：

Python实现的knn算法示例

注：这里使用到了statsmodels模块，可以点击此处本站下载statsmodels安装模块，再进入statsmodels模块所在目录位置，使用：

pip install statsmodels-0.9.0-cp27-none-win32.whl

进行statsmodels模块的安装

同理，出现ImportError: No module named pandas错误提示时，点击此处本站下载pandas模块，再使用

pip install pandas-0.23.1-cp27-none-win32.whl

进行pandas模块的安装

希望本文所述对大家Python程序设计有所帮助。

Python实现的knn算法示例

- Author -

旭旭_哥

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python字符串特性及常用字符串方法的简单笔记

Jan 04 Python

Python中元组,列表,字典的区别

May 21 Python

Python更新数据库脚本两种方法及对比介绍

Jul 27 Python

Python自动化运维_文件内容差异对比分析

Dec 13 Python

用python一行代码得到数组中某个元素的个数方法

Jan 28 Python

python 中xpath爬虫实例详解

Aug 26 Python

Python列表list操作相关知识小结

Jan 29 Python

python实现银行实战系统

Feb 26 Python

Python IDE环境之新版Pycharm安装详细教程

Mar 05 Python

Python动态导入模块：__import__、importlib、动态导入的使用场景实例分析

Mar 30 Python

python 基于opencv去除图片阴影

Jan 26 Python

python爬取youtube视频的示例代码

Mar 03 Python

查看TensorFlow checkpoint文件中的变量名和对应值方法

Jun 14 #Python

Tensorflow 查看变量的值方法

Jun 14 #Python

对Tensorflow中权值和feature map的可视化详解

Jun 14 #Python

TensorFlow的权值更新方法

Jun 14 #Python

python字符串常用方法

Jun 14 #Python

tensorflow 输出权重到csv或txt的实例

Jun 14 #Python

修复 Django migration 时遇到的问题解决

Jun 14 #Python

You might like

用文本文件制作留言板提示（下）

2006/10/09 PHP

Eclipse中php插件安装及Xdebug配置的使用详解

2013/04/25 PHP

php实现表单多按钮提交action的处理方法

2015/10/24 PHP

PHP正则表达式过滤html标签属性(DEMO)

2016/05/04 PHP

php实现生成PDF文件的方法示例【基于FPDF类库】

2018/07/21 PHP

jQuery学习笔记之jQuery构建函数的7种方法

2014/06/03 Javascript

JavaScript闭包函数访问外部变量的方法

2014/08/27 Javascript

用console.table()调试javascript

2014/09/04 Javascript

jQuery实现图像旋转动画效果

2016/05/29 Javascript

node.js学习之交互式解释器REPL详解

2016/12/08 Javascript

vue.js开发环境安装教程

2017/03/17 Javascript

MUI 解决动态列表页图片懒加载再次加载不成功的bug问题

2017/04/13 Javascript

layui 解决form表单点击无反应的问题

2019/10/25 Javascript

通过C++学习Python

2015/01/20 Python

浅谈Python 列表字典赋值的陷阱

2019/01/20 Python

Pandas读取并修改excel的示例代码

2019/02/17 Python

Python的高阶函数用法实例分析

2019/04/11 Python

python利用Opencv实现人脸识别功能

2019/04/25 Python

Python爬取知乎图片代码实现解析

2019/09/17 Python

Cython编译python为so 代码加密示例

2019/12/23 Python

解决matplotlib.pyplot在Jupyter notebook中不显示图像问题

2020/04/22 Python

HTML5印章绘制电子签章图片(中文英文椭圆章、中文英文椭圆印章)

2019/06/03 HTML / CSS

荷兰DOD药房中文官网：DeOnlineDrogist

2020/12/27 全球购物

c++工程师面试问题

2013/08/04 面试题

MIS软件工程师的面试题

2016/04/22 面试题

优秀英语专业毕业生求职信

2013/11/23 职场文书

大学生自我鉴定

2013/12/16 职场文书

公司前台辞职报告

2014/01/19 职场文书

国庆节活动总结

2014/08/26 职场文书

学习焦裕禄观后感

2015/06/09 职场文书

中秋节主题班会

2015/08/14 职场文书

解决go在函数退出后子协程的退出问题

2021/04/30 Golang

如何用threejs实现实时多边形折射

2021/05/07 Javascript

浅谈mysql哪些情况会导致索引失效

2021/11/20 MySQL

MySQL 主从复制数据不一致的解决方法

2022/03/18 MySQL

利用uni-app生成微信小程序的踩坑记录

2022/04/05 Javascript