编程 Python

python机器学习库xgboost的使用

Posted in Python onJanuary 20, 2020

1.数据读取

利用原生xgboost库读取libsvm数据

import xgboost as xgb
 data = xgb.DMatrix(libsvm文件)

使用sklearn读取libsvm数据

from sklearn.datasets import load_svmlight_file
 X_train,y_train = load_svmlight_file(libsvm文件)

使用pandas读取完数据后在转化为标准形式

2.模型训练过程

1.未调参基线模型

使用xgboost原生库进行训练

import xgboost as xgb
from sklearn.metrics import accuracy_score

dtrain = xgb.DMatrix(f_train, label = l_train)
dtest = xgb.DMatrix(f_test, label = l_test)
param = {'max_depth':2, 'eta':1, 'silent':0, 'objective':'binary:logistic' }
num_round = 2
bst = xgb.train(param, dtrain, num_round)
train_preds = bst.predict(dtrain)
train_predictions = [round(value) for value in train_preds] #进行四舍五入的操作--变成0.1(算是设定阈值的符号函数)
train_accuracy = accuracy_score(l_train, train_predictions) #使用sklearn进行比较正确率
print ("Train Accuary: %.2f%%" % (train_accuracy * 100.0))

from xgboost import plot_importance #显示特征重要性
plot_importance(bst)#打印重要程度结果。
pyplot.show()

使用XGBClassifier进行训练

# 未设定早停止， 未进行矩阵变换
from xgboost import XGBClassifier
from sklearn.datasets import load_svmlight_file #用于直接读取svmlight文件形式， 否则就需要使用xgboost.DMatrix(文件名)来读取这种格式的文件
from sklearn.metrics import accuracy_score
from matplotlib import pyplot


num_round = 100
bst1 =XGBClassifier(max_depth=2, learning_rate=1, n_estimators=num_round, #弱分类树太少的话取不到更多的特征重要性
          silent=True, objective='binary:logistic')
bst1.fit(f_train, l_train)

train_preds = bst1.predict(f_train)
train_accuracy = accuracy_score(l_train, train_preds)
print ("Train Accuary: %.2f%%" % (train_accuracy * 100.0))

preds = bst1.predict(f_test)
test_accuracy = accuracy_score(l_test, preds)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))

from xgboost import plot_importance #显示特征重要性
plot_importance(bst1)#打印重要程度结果。
pyplot.show()

2.两种交叉验证方式

使用cross_val_score进行交叉验证

#利用model_selection进行交叉训练
from xgboost import XGBClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from matplotlib import pyplot

param = {'max_depth':2, 'eta':1, 'silent':0, 'objective':'binary:logistic' }
num_round = 100
bst2 =XGBClassifier(max_depth=2, learning_rate=0.1,n_estimators=num_round, silent=True, objective='binary:logistic')
bst2.fit(f_train, l_train)
kfold = StratifiedKFold(n_splits=10, random_state=7)
results = cross_val_score(bst2, f_train, l_train, cv=kfold)#对数据进行十折交叉验证--9份训练，一份测试
print(results)
print("CV Accuracy: %.2f%% (%.2f%%)" % (results.mean()*100, results.std()*100))

from xgboost import plot_importance #显示特征重要性
plot_importance(bst2)#打印重要程度结果。
pyplot.show()

python机器学习库xgboost的使用

使用GridSearchCV进行网格搜索

#使用sklearn中提供的网格搜索进行测试--找出最好参数，并作为默认训练参数
from xgboost import XGBClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from matplotlib import pyplot

params = {'max_depth':2, 'eta':0.1, 'silent':0, 'objective':'binary:logistic' }
bst =XGBClassifier(max_depth=2, learning_rate=0.1, silent=True, objective='binary:logistic')
param_test = {
 'n_estimators': range(1, 51, 1)
}
clf = GridSearchCV(estimator = bst, param_grid = param_test, scoring='accuracy', cv=5)# 5折交叉验证
clf.fit(f_train, l_train) #默认使用最优的参数


preds = clf.predict(f_test)

test_accuracy = accuracy_score(l_test, preds)
print("Test Accuracy of gridsearchcv: %.2f%%" % (test_accuracy * 100.0))

clf.cv_results_, clf.best_params_, clf.best_score_

3.早停止调参?early_stopping_rounds（查看的是损失是否变化）

#进行提早停止的单独实例
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from matplotlib import pyplot

param = {'max_depth':2, 'eta':1, 'silent':0, 'objective':'binary:logistic' }
num_round = 100
bst =XGBClassifier(max_depth=2, learning_rate=0.1, n_estimators=num_round, silent=True, objective='binary:logistic')
eval_set =[(f_test, l_test)]
bst.fit(f_train, l_train, early_stopping_rounds=10, eval_metric="error",eval_set=eval_set, verbose=True) #early_stopping_rounds--当多少次的效果差不多时停止  eval_set--用于显示损失率的数据 verbose--显示错误率的变化过程

# make prediction
preds = bst.predict(f_test)

test_accuracy = accuracy_score(l_test, preds)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))

4.多数据观察训练损失

#多参数顺
import xgboost as xgb
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score
from matplotlib import pyplot

num_round = 100
bst =XGBClassifier(max_depth=2, learning_rate=0.1, n_estimators=num_round, silent=True, objective='binary:logistic')
eval_set = [(f_train, l_train), (f_test, l_test)]
bst.fit(f_train, l_train, eval_metric=["error", "logloss"], eval_set=eval_set, verbose=True)

# make prediction
preds = bst.predict(f_test)
test_accuracy = accuracy_score(l_test, preds)
print("Test Accuracy: %.2f%%" % (test_accuracy * 100.0))

python机器学习库xgboost的使用

5.模型保存与读取

#模型保存
bst.save_model('demo.model')

#模型读取与预测
modelfile = 'demo.model'

# 1
bst = xgb.Booster({'nthread':8}, model_file = modelfile)

# 2

f_test1 = xgb.DMatrix(f_test) #尽量使用xgboost的自己的数据矩阵
ypred1 = bst.predict(f_test1)
train_predictions = [round(value) for value in ypred1]
test_accuracy1 = accuracy_score(l_test, train_predictions)
print("Test Accuracy: %.2f%%" % (test_accuracy1 * 100.0))

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持三水点靠木。

python机器学习库xgboost的使用

- Author -

宋建国

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python装饰器入门学习教程(九步学习)

Jan 28 Python

Python利用multiprocessing实现最简单的分布式作业调度系统实例

Nov 14 Python

用python的requests第三方模块抓取王者荣耀所有英雄的皮肤实例

Dec 14 Python

python中in在list和dict中查找效率的对比分析

May 04 Python

python保存网页图片到本地的方法

Jul 24 Python

python GUI图形化编程wxpython的使用

Jul 19 Python

python3连接kafka模块pykafka生产者简单封装代码

Dec 23 Python

Python namedtuple命名元组实现过程解析

Jan 08 Python

Python包和模块的分发详细介绍

Jun 19 Python

如何利用Python动态模拟太阳系运转

Sep 04 Python

python 基于opencv实现图像增强

Dec 23 Python

Python 实现进度条的六种方式

Jan 06 Python

python 爬取马蜂窝景点翻页文字评论的实现

Jan 20 #Python

tensorflow-gpu安装的常见问题及解决方案

Jan 20 #Python

win10安装tensorflow-gpu1.8.0详细完整步骤

Jan 20 #Python

tensorflow -gpu安装方法（不用自己装cuda，cdnn）

Jan 20 #Python

基于Python获取照片的GPS位置信息

Jan 20 #Python

如何基于pythonnet调用halcon脚本

Jan 20 #Python

使用TensorFlow对图像进行随机旋转的实现示例

Jan 20 #Python

You might like

PHP 基本语法格式

2009/12/15 PHP

PHP数据库操作之基于Mysqli的数据库操作类库

2014/04/19 PHP

详解PHP中的mb_detect_encoding函数使用方法

2015/08/18 PHP

php用户登录之cookie信息安全分析

2016/05/13 PHP

PHP中关键字interface和implements详解

2017/06/14 PHP

PHP笛卡尔积实现算法示例

2018/07/30 PHP

php探针使用原理和技巧讲解

2019/09/17 PHP

MacOS下PHP7.1升级到PHP7.4.15的方法

2021/02/22 PHP

js 加载时自动调整图片大小

2008/05/28 Javascript

如何制作浮动广告 JavaScript制作浮动广告代码

2012/12/30 Javascript

form表单只提交数据而不进行页面跳转的解决方案

2013/09/18 Javascript

JSONObject使用方法详解

2015/12/17 Javascript

React Native实现简单的登录功能(推荐)

2016/09/19 Javascript

javascript this详细介绍

2016/09/19 Javascript

JavaScript 中 avalon绑定属性总结

2016/10/19 Javascript

用Angular实时获取本地Localstorage数据,实现一个模拟后台数据登入的效果

2016/11/09 Javascript

JavaScript中捕获与冒泡详解及实例

2017/02/03 Javascript

jquery封装插件时匿名函数形参和实参的写法解释

2017/02/14 Javascript

浅析java线程中断的办法

2018/07/29 Javascript

layer.open 子页面弹出层向父页面传输数据的例子

2019/09/26 Javascript

node省市区三级数据性能测评实例分析

2019/11/06 Javascript

javascript 内存模型实例详解

2020/04/18 Javascript

跟老齐学Python之集成开发环境(IDE)

2014/09/12 Python

使用Python下的XSLT API进行web开发的简单教程

2015/04/15 Python

python爬虫的数据库连接问题【推荐】

2018/06/25 Python

浅谈Python采集网页时正则表达式匹配换行符的问题

2018/12/20 Python

django框架自定义模板标签(template tag)操作示例

2019/06/24 Python

总结Pyinstaller的坑及终极解决方法(小结)

2020/09/21 Python

英国最大的奢侈珠宝和手表网站：C W Sellors

2017/02/10 全球购物

美国最大的船只买卖在线市场：Boat Trader

2018/08/04 全球购物

美国婴儿和儿童服装购物网站：PatPat

2020/10/01 全球购物

旅游项目合作意向书

2015/05/08 职场文书

高中开学感言

2015/08/01 职场文书

大学生入党自我鉴定范文

2019/06/21 职场文书

深度学习tensorflow基础mnist

2021/04/14 Python

go原生库的中bytes.Buffer用法

2021/04/25 Golang