编程 Python

Tensorflow训练模型越来越慢的2种解决方案

Posted in Python onFebruary 07, 2020

1 解决方案

【方案一】

载入模型结构放在全局，即tensorflow会话外层。

'''载入模型结构:最关键的一步'''
saver = tf.train.Saver()
'''建立会话'''
with tf.Session() as sess:
 for i in range(STEPS):
 '''开始训练'''
 _, loss_1, acc, summary = sess.run([train_op_1, train_loss, train_acc, summary_op], feed_dict=feed_dict)
 '''保存模型'''
 saver.save(sess, save_path="./model/path", i)

【方案二】

在方案一的基础上，将模型结构放在图会话的外部。

'''预测值'''
train_logits= network_model.inference(inputs, keep_prob)
'''损失值'''
train_loss = network_model.losses(train_logits)
'''优化'''
train_op = network_model.train(train_loss, learning_rate)
'''准确率'''
train_acc = network_model.evaluation(train_logits, labels)
'''模型输入'''
feed_dict = {inputs: x_batch, labels: y_batch, keep_prob: 0.5}
'''载入模型结构'''
saver = tf.train.Saver()
'''建立会话'''
with tf.Session() as sess:
 for i in range(STEPS):
 '''开始训练'''
 _, loss_1, acc, summary = sess.run([train_op_1, train_loss, train_acc, summary_op], feed_dict=feed_dict)
 '''保存模型'''
 saver.save(sess, save_path="./model/path", i)

2 时间测试

通过不同方法测试训练程序，得到不同的训练时间，每执行一次训练都重新载入图结构，会使每一步的训练时间逐次增加，如果训练步数越大，后面训练速度越来越慢，最终可导致图爆炸，而终止训练。

【时间累加】

2019-05-15 10:55:29.009205: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
step: 0, time cost: 1.8800880908966064
step: 1, time cost: 1.592250108718872
step: 2, time cost: 1.553826093673706
step: 3, time cost: 1.5687050819396973
step: 4, time cost: 1.5777575969696045
step: 5, time cost: 1.5908267498016357
step: 6, time cost: 1.5989274978637695
step: 7, time cost: 1.6078357696533203
step: 8, time cost: 1.6087186336517334
step: 9, time cost: 1.6123006343841553
step: 10, time cost: 1.6320762634277344
step: 11, time cost: 1.6317598819732666
step: 12, time cost: 1.6570467948913574
step: 13, time cost: 1.6584930419921875
step: 14, time cost: 1.6765813827514648
step: 15, time cost: 1.6751370429992676
step: 16, time cost: 1.7304580211639404
step: 17, time cost: 1.7583982944488525

【时间均衡】

2019-05-15 13:03:49.394354: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 7048 MB memory) -> physical GPU (device: 1, name: Tesla P4, pci bus id: 0000:00:0d.0, compute capability: 6.1)
step: 0, time cost: 1.9781079292297363
loss1:6.78, loss2:5.47, loss3:5.27, loss4:7.31, loss5:5.44, loss6:6.87, loss7: 6.84
Total loss: 43.98, accuracy: 0.04, steps: 0, time cost: 1.9781079292297363
step: 1, time cost: 0.09688425064086914
step: 2, time cost: 0.09693264961242676
step: 3, time cost: 0.09671926498413086
step: 4, time cost: 0.09688210487365723
step: 5, time cost: 0.09646058082580566
step: 6, time cost: 0.09669041633605957
step: 7, time cost: 0.09666872024536133
step: 8, time cost: 0.09651994705200195
step: 9, time cost: 0.09705543518066406
step: 10, time cost: 0.09690332412719727

3 原因分析

(1) Tensorflow使用图结构构建系统，图结构中有节点(node)和边(operation)，每次进行计算时会向图中添加边和节点进行计算或者读取已存在的图结构；

(2) 使用图结构也是一把双刃之剑，可以加快计算和提高设计效率，但是，程序设计不合理会导向负面，使训练越来约慢；

(3) 训练越来越慢是因为运行一次sess.run，向图中添加一次节点或者重新载入一次图结构，导致图中节点和边越来越多，计算参数也成倍增长；

(4) tf.train.Saver()就是载入图结构的类，因此设计训练程序时，若每执行一次跟新就使用该类载入图结构，自然会增加参数数量，必然导致训练变慢；

(5) 因此，将载入图结构的类放在全局，即只载入一次图结构，其他时间只训练图结构中的参数，可保持原有的训练速度；

4 总结

(1) 设计训练网络，只载入一次图结构即可；

(2) tf.train.Saver()就是载入图结构的类，将该类的实例化放在全局，即会话外部，解决训练越来越慢。

以上这篇Tensorflow训练模型越来越慢的2种解决方案就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持三水点靠木。

Tensorflow训练模型越来越慢的2种解决方案

- Author -

xdq101

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python中optparser库用法实例详解

Jan 26 Python

Python 实现使用dict 创建二维数据、DataFrame

Apr 13 Python

pip安装时ReadTimeoutError的解决方法

Jun 12 Python

python实现根据指定字符截取对应的行的内容方法

Oct 23 Python

pycharm配置git(图文教程)

Aug 16 Python

python numpy之np.random的随机数函数使用介绍

Oct 06 Python

python 队列基本定义与使用方法【初始化、赋值、判断等】

Oct 24 Python

Python读取实时数据流示例

Dec 02 Python

Python 使用 prettytable 库打印表格美化输出功能

Dec 26 Python

使用 prometheus python 库编写自定义指标的方法(完整代码)

Jun 29 Python

Python3+SQLAlchemy+Sqlite3实现ORM教程

Feb 16 Python

python scrapy简单模拟登录的代码分析

Jul 21 Python

详解python itertools功能

Feb 07 #Python

Python中itertools的用法详解

Feb 07 #Python

Python转换itertools.chain对象为数组的方法

Feb 07 #Python

已安装tensorflow-gpu,但keras无法使用GPU加速的解决

Feb 07 #Python

python十进制转二进制的详解

Feb 07 #Python

基于Tensorflow使用CPU而不用GPU问题的解决

Feb 07 #Python

python实现ip地址的包含关系判断

Feb 07 #Python

You might like

php中截取中文字符串的代码小结

2011/07/17 PHP

php中inlcude()性能对比详解

2012/09/16 PHP

php统计时间和内存使用情况示例分享

2014/03/13 PHP

PHP实现无限极分类图文教程

2014/11/25 PHP

phplist及phpmailer(组合使用)通过gmail发送邮件的配置方法

2016/03/30 PHP

PHP获取php,mysql,apche的版本信息及更多服务器信息

2021/03/09 PHP

JavaScript编程开发中的五个实用小技巧

2010/07/22 Javascript

js中top、clientTop、scrollTop、offsetTop的区别文字详细说明版

2011/01/08 Javascript

在IE和VB中支持png图片透明效果的实现方法(vb源码打包)

2011/04/01 Javascript

DOM 中的事件处理介绍

2012/01/18 Javascript

js的flv视频播放器插件使用方法

2015/06/23 Javascript

JS常见问题之为什么点击弹出的i总是最后一个

2016/01/05 Javascript

关于vue.js弹窗组件的知识点总结

2016/09/11 Javascript

纯JS焦点图特效实例(可一个页面多用)

2016/12/07 Javascript

JavaScript解析JSON格式数据的方法示例

2017/01/24 Javascript

9个JavaScript日常开发小技巧

2020/10/06 Javascript

[00:32]2018DOTA2亚洲邀请赛iG出场

2018/04/03 DOTA

[04:40]DOTA2-DPC中国联赛1月26日Recap集锦

2021/03/11 DOTA

利用python批量修改word文件名的方法示例

2017/10/17 Python

pandas 空数据处理方法详解

2019/11/02 Python

python调用API接口实现登陆短信验证

2020/05/10 Python

旷课检讨书2000字

2014/01/14 职场文书

会计专业自我评价

2014/02/12 职场文书

工程质量承诺书范文

2014/03/27 职场文书

学校对教师的评语

2014/04/28 职场文书

房地产公司见习自我鉴定

2014/04/28 职场文书

教师国庆节演讲稿范文2014

2014/09/21 职场文书

2014年计划生育协会工作总结

2014/11/14 职场文书

2014年电厂工作总结

2014/12/04 职场文书

泰山导游词

2015/02/02 职场文书

python 实现两个变量值进行交换的n种操作

2021/06/02 Python

一篇文章告诉你如何实现Vue前端分页和后端分页

2022/02/18 Vue.js

基于Python实现股票收益率分析

2022/04/02 Python

MySQL 数据库范式化设计理论

2022/04/22 MySQL

MySQL 字符集 character

2022/05/04 MySQL

cypress测试本地web应用

2022/06/01 Javascript