Python基于sklearn库的分类算法简单应用示例


Posted in Python onJuly 09, 2018

本文实例讲述了Python基于sklearn库的分类算法简单应用。分享给大家供大家参考,具体如下:

scikit-learn已经包含在Anaconda中。也可以在官方下载源码包进行安装。本文代码里封装了如下机器学习算法,我们修改数据加载函数,即可一键测试:

# coding=gbk
'''
Created on 2016年6月4日
@author: bryan
'''
import time
from sklearn import metrics
import pickle as pickle
import pandas as pd
# Multinomial Naive Bayes Classifier
def naive_bayes_classifier(train_x, train_y):
  from sklearn.naive_bayes import MultinomialNB
  model = MultinomialNB(alpha=0.01)
  model.fit(train_x, train_y)
  return model
# KNN Classifier
def knn_classifier(train_x, train_y):
  from sklearn.neighbors import KNeighborsClassifier
  model = KNeighborsClassifier()
  model.fit(train_x, train_y)
  return model
# Logistic Regression Classifier
def logistic_regression_classifier(train_x, train_y):
  from sklearn.linear_model import LogisticRegression
  model = LogisticRegression(penalty='l2')
  model.fit(train_x, train_y)
  return model
# Random Forest Classifier
def random_forest_classifier(train_x, train_y):
  from sklearn.ensemble import RandomForestClassifier
  model = RandomForestClassifier(n_estimators=8)
  model.fit(train_x, train_y)
  return model
# Decision Tree Classifier
def decision_tree_classifier(train_x, train_y):
  from sklearn import tree
  model = tree.DecisionTreeClassifier()
  model.fit(train_x, train_y)
  return model
# GBDT(Gradient Boosting Decision Tree) Classifier
def gradient_boosting_classifier(train_x, train_y):
  from sklearn.ensemble import GradientBoostingClassifier
  model = GradientBoostingClassifier(n_estimators=200)
  model.fit(train_x, train_y)
  return model
# SVM Classifier
def svm_classifier(train_x, train_y):
  from sklearn.svm import SVC
  model = SVC(kernel='rbf', probability=True)
  model.fit(train_x, train_y)
  return model
# SVM Classifier using cross validation
def svm_cross_validation(train_x, train_y):
  from sklearn.grid_search import GridSearchCV
  from sklearn.svm import SVC
  model = SVC(kernel='rbf', probability=True)
  param_grid = {'C': [1e-3, 1e-2, 1e-1, 1, 10, 100, 1000], 'gamma': [0.001, 0.0001]}
  grid_search = GridSearchCV(model, param_grid, n_jobs = 1, verbose=1)
  grid_search.fit(train_x, train_y)
  best_parameters = grid_search.best_estimator_.get_params()
  for para, val in list(best_parameters.items()):
    print(para, val)
  model = SVC(kernel='rbf', C=best_parameters['C'], gamma=best_parameters['gamma'], probability=True)
  model.fit(train_x, train_y)
  return model
def read_data(data_file):
  data = pd.read_csv(data_file)
  train = data[:int(len(data)*0.9)]
  test = data[int(len(data)*0.9):]
  train_y = train.label
  train_x = train.drop('label', axis=1)
  test_y = test.label
  test_x = test.drop('label', axis=1)
  return train_x, train_y, test_x, test_y
if __name__ == '__main__':
  data_file = "H:\\Research\\data\\trainCG.csv"
  thresh = 0.5
  model_save_file = None
  model_save = {}
  test_classifiers = ['NB', 'KNN', 'LR', 'RF', 'DT', 'SVM','SVMCV', 'GBDT']
  classifiers = {'NB':naive_bayes_classifier,
         'KNN':knn_classifier,
          'LR':logistic_regression_classifier,
          'RF':random_forest_classifier,
          'DT':decision_tree_classifier,
         'SVM':svm_classifier,
        'SVMCV':svm_cross_validation,
         'GBDT':gradient_boosting_classifier
  }
  print('reading training and testing data...')
  train_x, train_y, test_x, test_y = read_data(data_file)
  for classifier in test_classifiers:
    print('******************* %s ********************' % classifier)
    start_time = time.time()
    model = classifiers[classifier](train_x, train_y)
    print('training took %fs!' % (time.time() - start_time))
    predict = model.predict(test_x)
    if model_save_file != None:
      model_save[classifier] = model
    precision = metrics.precision_score(test_y, predict)
    recall = metrics.recall_score(test_y, predict)
    print('precision: %.2f%%, recall: %.2f%%' % (100 * precision, 100 * recall))
    accuracy = metrics.accuracy_score(test_y, predict)
    print('accuracy: %.2f%%' % (100 * accuracy))
  if model_save_file != None:
    pickle.dump(model_save, open(model_save_file, 'wb'))

测试结果如下:

reading training and testing data...
******************* NB ********************
training took 0.004986s!
precision: 78.08%, recall: 71.25%
accuracy: 74.17%
******************* KNN ********************
training took 0.017545s!
precision: 97.56%, recall: 100.00%
accuracy: 98.68%
******************* LR ********************
training took 0.061161s!
precision: 89.16%, recall: 92.50%
accuracy: 90.07%
******************* RF ********************
training took 0.040111s!
precision: 96.39%, recall: 100.00%
accuracy: 98.01%
******************* DT ********************
training took 0.004513s!
precision: 96.20%, recall: 95.00%
accuracy: 95.36%
******************* SVM ********************
training took 0.242145s!
precision: 97.53%, recall: 98.75%
accuracy: 98.01%
******************* SVMCV ********************
Fitting 3 folds for each of 14 candidates, totalling 42 fits
[Parallel(n_jobs=1)]: Done  42 out of  42 | elapsed:    6.8s finished
probability True
verbose False
coef0 0.0
degree 3
tol 0.001
shrinking True
cache_size 200
gamma 0.001
max_iter -1
C 1000
decision_function_shape None
random_state None
class_weight None
kernel rbf
training took 7.434668s!
precision: 98.75%, recall: 98.75%
accuracy: 98.68%
******************* GBDT ********************
training took 0.521916s!
precision: 97.56%, recall: 100.00%
accuracy: 98.68%

希望本文所述对大家Python程序设计有所帮助。

Python 相关文章推荐
简单文件操作python 修改文件指定行的方法
May 15 Python
Python 文件读写操作实例详解
Mar 12 Python
Python中使用PDB库调试程序
Apr 05 Python
python中map、any、all函数用法分析
Apr 21 Python
python结合API实现即时天气信息
Jan 19 Python
详解python列表(list)的使用技巧及高级操作
Aug 15 Python
Django 实现Admin自动填充当前用户的示例代码
Nov 18 Python
Python ORM框架Peewee用法详解
Apr 29 Python
python 图像判断,清晰度(明暗),彩色与黑白实例
Jun 04 Python
JAVA SWT事件四种写法实例解析
Jun 05 Python
idea2020手动安装python插件的实现方法
Jul 17 Python
python进度条显示-tqmd模块的实现示例
Aug 23 Python
Python不使用int()函数把字符串转换为数字的方法
Jul 09 #Python
python中ASCII码和字符的转换方法
Jul 09 #Python
python中ASCII码字符与int之间的转换方法
Jul 09 #Python
Python 十六进制整数与ASCii编码字符串相互转换方法
Jul 09 #Python
python 以16进制打印输出的方法
Jul 09 #Python
python爬虫之urllib3的使用示例
Jul 09 #Python
机器学习之KNN算法原理及Python实现方法详解
Jul 09 #Python
You might like
php a simple smtp class
2007/11/26 PHP
php读取文件内容的几种方法详解
2013/06/26 PHP
laravel中命名路由的使用方法
2017/02/24 PHP
PHP封装的数据库模型Model类完整示例【基于PDO】
2019/03/14 PHP
JavaScript 入门·JavaScript 具有全范围的运算符
2007/10/01 Javascript
JavaScript 高级篇之DOM文档,简单封装及调用、动态添加、删除样式(六)
2012/04/07 Javascript
js 三级关联菜单效果实例
2013/08/13 Javascript
jquery修改网页背景颜色通过css方法实现
2014/06/06 Javascript
纯javascript实现图片延时加载方法
2015/08/21 Javascript
在React框架中实现一些AngularJS中ng指令的例子
2016/03/06 Javascript
jQuery操作iframe中js函数的方法小结
2016/07/06 Javascript
使用JS组件实现带ToolTip验证框的实例代码
2017/08/23 Javascript
vue实现tab切换外加样式切换方法
2018/03/16 Javascript
JavaScript实现alert弹框效果
2020/11/19 Javascript
老生常谈进程线程协程那些事儿
2017/07/24 Python
python如何实现int函数的方法示例
2018/02/19 Python
基于python 二维数组及画图的实例详解
2018/04/03 Python
Python图片转换成矩阵,矩阵数据转换成图片的实例
2018/07/02 Python
十行代码使用Python写一个USB病毒
2019/06/21 Python
Python生命游戏实现原理及过程解析(附源代码)
2019/08/01 Python
详解pandas DataFrame的查询方法(loc,iloc,at,iat,ix的用法和区别)
2019/08/02 Python
python3中sorted函数里cmp参数改变详解
2020/03/12 Python
django执行数据库查询之后实现返回的结果集转json
2020/03/31 Python
10张动图学会python循环与递归问题
2021/02/06 Python
美国Rue La La闪购网站:奢侈品、中高档品牌限时折扣
2016/10/19 全球购物
.NET面试题:什么是值类型和引用类型
2016/01/12 面试题
师范生实习自我鉴定
2013/11/01 职场文书
质检员的岗位职责
2013/11/15 职场文书
《童年的发现》教学反思
2014/02/14 职场文书
结婚周年感言
2014/02/24 职场文书
浅谈Laravel中使用Slack进行异常通知
2021/05/29 PHP
python 网络编程要点总结
2021/06/18 Python
总结Python连接CS2000的详细步骤
2021/06/23 Python
与Windows10相比Windows11有哪些改进?值不值得升级?
2021/11/21 数码科技
win10电脑右下角输入法图标不见了?Win10右下角不显示输入法的解决方法
2022/07/23 数码科技
JS开发前端团队展示控制器来为成员引流
2022/08/14 Javascript