pytorch 实现L2和L1正则化regularization的操作


Posted in Python onMarch 03, 2021

1.torch.optim优化器实现L2正则化

torch.optim集成了很多优化器,如SGD,Adadelta,Adam,Adagrad,RMSprop等,这些优化器自带的一个参数weight_decay,用于指定权值衰减率,相当于L2正则化中的λ参数,注意torch.optim集成的优化器只有L2正则化方法,你可以查看注释,参数weight_decay 的解析是:

weight_decay (float, optional): weight decay (L2 penalty) (default: 0)

使用torch.optim的优化器,可如下设置L2正则化

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.01)

pytorch 实现L2和L1正则化regularization的操作

但是这种方法存在几个问题,

(1)一般正则化,只是对模型的权重W参数进行惩罚,而偏置参数b是不进行惩罚的,而torch.optim的优化器weight_decay参数指定的权值衰减是对网络中的所有参数,包括权值w和偏置b同时进行惩罚。很多时候如果对b 进行L2正则化将会导致严重的欠拟合,因此这个时候一般只需要对权值w进行正则即可。(PS:这个我真不确定,源码解析是 weight decay (L2 penalty) ,但有些网友说这种方法会对参数偏置b也进行惩罚,可解惑的网友给个明确的答复)

(2)缺点:torch.optim的优化器固定实现L2正则化,不能实现L1正则化。如果需要L1正则化,可如下实现:

pytorch 实现L2和L1正则化regularization的操作

(3)根据正则化的公式,加入正则化后,loss会变原来大,比如weight_decay=1的loss为10,那么weight_decay=100时,loss输出应该也提高100倍左右。而采用torch.optim的优化器的方法,如果你依然采用loss_fun= nn.CrossEntropyLoss()进行计算loss,你会发现,不管你怎么改变weight_decay的大小,loss会跟之前没有加正则化的大小差不多。这是因为你的loss_fun损失函数没有把权重W的损失加上。

(4)采用torch.optim的优化器实现正则化的方法,是没问题的!只不过很容易让人产生误解,对鄙人而言,我更喜欢TensorFlow的正则化实现方法,只需要tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES),实现过程几乎跟正则化的公式对应的上。

(5)Github项目源码:点击进入

为了,解决这些问题,我特定自定义正则化的方法,类似于TensorFlow正则化实现方法。

2. 如何判断正则化作用了模型?

一般来说,正则化的主要作用是避免模型产生过拟合,当然啦,过拟合问题,有时候是难以判断的。但是,要判断正则化是否作用了模型,还是很容易的。下面我给出两组训练时产生的loss和Accuracy的log信息,一组是未加入正则化的,一组是加入正则化:

2.1 未加入正则化loss和Accuracy

优化器采用Adam,并且设置参数weight_decay=0.0,即无正则化的方法

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=0.0)

训练时输出的 loss和Accuracy信息

step/epoch:0/0,Train Loss: 2.418065, Acc: [0.15625]
step/epoch:10/0,Train Loss: 5.194936, Acc: [0.34375]
step/epoch:20/0,Train Loss: 0.973226, Acc: [0.8125]
step/epoch:30/0,Train Loss: 1.215165, Acc: [0.65625]
step/epoch:40/0,Train Loss: 1.808068, Acc: [0.65625]
step/epoch:50/0,Train Loss: 1.661446, Acc: [0.625]
step/epoch:60/0,Train Loss: 1.552345, Acc: [0.6875]
step/epoch:70/0,Train Loss: 1.052912, Acc: [0.71875]
step/epoch:80/0,Train Loss: 0.910738, Acc: [0.75]
step/epoch:90/0,Train Loss: 1.142454, Acc: [0.6875]
step/epoch:100/0,Train Loss: 0.546968, Acc: [0.84375]
step/epoch:110/0,Train Loss: 0.415631, Acc: [0.9375]
step/epoch:120/0,Train Loss: 0.533164, Acc: [0.78125]
step/epoch:130/0,Train Loss: 0.956079, Acc: [0.6875]
step/epoch:140/0,Train Loss: 0.711397, Acc: [0.8125]

2.1 加入正则化loss和Accuracy

优化器采用Adam,并且设置参数weight_decay=10.0,即正则化的权重lambda =10.0

optimizer = optim.Adam(model.parameters(),lr=learning_rate,weight_decay=10.0)

这时,训练时输出的 loss和Accuracy信息:

step/epoch:0/0,Train Loss: 2.467985, Acc: [0.09375]
step/epoch:10/0,Train Loss: 5.435320, Acc: [0.40625]
step/epoch:20/0,Train Loss: 1.395482, Acc: [0.625]
step/epoch:30/0,Train Loss: 1.128281, Acc: [0.6875]
step/epoch:40/0,Train Loss: 1.135289, Acc: [0.6875]
step/epoch:50/0,Train Loss: 1.455040, Acc: [0.5625]
step/epoch:60/0,Train Loss: 1.023273, Acc: [0.65625]
step/epoch:70/0,Train Loss: 0.855008, Acc: [0.65625]
step/epoch:80/0,Train Loss: 1.006449, Acc: [0.71875]
step/epoch:90/0,Train Loss: 0.939148, Acc: [0.625]
step/epoch:100/0,Train Loss: 0.851593, Acc: [0.6875]
step/epoch:110/0,Train Loss: 1.093970, Acc: [0.59375]
step/epoch:120/0,Train Loss: 1.699520, Acc: [0.625]
step/epoch:130/0,Train Loss: 0.861444, Acc: [0.75]
step/epoch:140/0,Train Loss: 0.927656, Acc: [0.625]

当weight_decay=10000.0

step/epoch:0/0,Train Loss: 2.337354, Acc: [0.15625]
step/epoch:10/0,Train Loss: 2.222203, Acc: [0.125]
step/epoch:20/0,Train Loss: 2.184257, Acc: [0.3125]
step/epoch:30/0,Train Loss: 2.116977, Acc: [0.5]
step/epoch:40/0,Train Loss: 2.168895, Acc: [0.375]
step/epoch:50/0,Train Loss: 2.221143, Acc: [0.1875]
step/epoch:60/0,Train Loss: 2.189801, Acc: [0.25]
step/epoch:70/0,Train Loss: 2.209837, Acc: [0.125]
step/epoch:80/0,Train Loss: 2.202038, Acc: [0.34375]
step/epoch:90/0,Train Loss: 2.192546, Acc: [0.25]
step/epoch:100/0,Train Loss: 2.215488, Acc: [0.25]
step/epoch:110/0,Train Loss: 2.169323, Acc: [0.15625]
step/epoch:120/0,Train Loss: 2.166457, Acc: [0.3125]
step/epoch:130/0,Train Loss: 2.144773, Acc: [0.40625]
step/epoch:140/0,Train Loss: 2.173397, Acc: [0.28125]

2.3 正则化说明

就整体而言,对比加入正则化和未加入正则化的模型,训练输出的loss和Accuracy信息,我们可以发现,加入正则化后,loss下降的速度会变慢,准确率Accuracy的上升速度会变慢,并且未加入正则化模型的loss和Accuracy的浮动比较大(或者方差比较大),而加入正则化的模型训练loss和Accuracy,表现的比较平滑。

并且随着正则化的权重lambda越大,表现的更加平滑。这其实就是正则化的对模型的惩罚作用,通过正则化可以使得模型表现的更加平滑,即通过正则化可以有效解决模型过拟合的问题。

3.自定义正则化的方法

为了解决torch.optim优化器只能实现L2正则化以及惩罚网络中的所有参数的缺陷,这里实现类似于TensorFlow正则化的方法。

3.1 自定义正则化Regularization类

这里封装成一个实现正则化的Regularization类,各个方法都给出了注释,自己慢慢看吧,有问题再留言吧

# 检查GPU是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# device='cuda'
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
 
class Regularization(torch.nn.Module):
 def __init__(self,model,weight_decay,p=2):
  '''
  :param model 模型
  :param weight_decay:正则化参数
  :param p: 范数计算中的幂指数值,默认求2范数,
     当p=0为L2正则化,p=1为L1正则化
  '''
  super(Regularization, self).__init__()
  if weight_decay <= 0:
   print("param weight_decay can not <=0")
   exit(0)
  self.model=model
  self.weight_decay=weight_decay
  self.p=p
  self.weight_list=self.get_weight(model)
  self.weight_info(self.weight_list)
 
 def to(self,device):
  '''
  指定运行模式
  :param device: cude or cpu
  :return:
  '''
  self.device=device
  super().to(device)
  return self
 
 def forward(self, model):
  self.weight_list=self.get_weight(model)#获得最新的权重
  reg_loss = self.regularization_loss(self.weight_list, self.weight_decay, p=self.p)
  return reg_loss
 
 def get_weight(self,model):
  '''
  获得模型的权重列表
  :param model:
  :return:
  '''
  weight_list = []
  for name, param in model.named_parameters():
   if 'weight' in name:
    weight = (name, param)
    weight_list.append(weight)
  return weight_list
 
 def regularization_loss(self,weight_list, weight_decay, p=2):
  '''
  计算张量范数
  :param weight_list:
  :param p: 范数计算中的幂指数值,默认求2范数
  :param weight_decay:
  :return:
  '''
  # weight_decay=Variable(torch.FloatTensor([weight_decay]).to(self.device),requires_grad=True)
  # reg_loss=Variable(torch.FloatTensor([0.]).to(self.device),requires_grad=True)
  # weight_decay=torch.FloatTensor([weight_decay]).to(self.device)
  # reg_loss=torch.FloatTensor([0.]).to(self.device)
  reg_loss=0
  for name, w in weight_list:
   l2_reg = torch.norm(w, p=p)
   reg_loss = reg_loss + l2_reg
 
  reg_loss=weight_decay*reg_loss
  return reg_loss
 
 def weight_info(self,weight_list):
  '''
  打印权重列表信息
  :param weight_list:
  :return:
  '''
  print("---------------regularization weight---------------")
  for name ,w in weight_list:
   print(name)
  print("---------------------------------------------------")

3.2 Regularization使用方法

使用方法很简单,就当一个普通Pytorch模块来使用:例如

# 检查GPU是否可用
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
 
print("-----device:{}".format(device))
print("-----Pytorch version:{}".format(torch.__version__))
 
weight_decay=100.0 # 正则化参数
 
model = my_net().to(device)
# 初始化正则化
if weight_decay>0:
 reg_loss=Regularization(model, weight_decay, p=2).to(device)
else:
 print("no regularization")
 
criterion= nn.CrossEntropyLoss().to(device) # CrossEntropyLoss=softmax+cross entropy
optimizer = optim.Adam(model.parameters(),lr=learning_rate)#不需要指定参数weight_decay
 
# train
batch_train_data=...
batch_train_label=...
 
out = model(batch_train_data)
 
# loss and regularization
loss = criterion(input=out, target=batch_train_label)
if weight_decay > 0:
 loss = loss + reg_loss(model)
total_loss = loss.item()
 
# backprop
optimizer.zero_grad()#清除当前所有的累积梯度
total_loss.backward()
optimizer.step()

训练时输出的 loss和Accuracy信息:

(1)当weight_decay=0.0时,未使用正则化

step/epoch:0/0,Train Loss: 2.379627, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1.473092, Acc: [0.6875]
step/epoch:20/0,Train Loss: 0.931847, Acc: [0.8125]
step/epoch:30/0,Train Loss: 0.625494, Acc: [0.875]
step/epoch:40/0,Train Loss: 2.241885, Acc: [0.53125]
step/epoch:50/0,Train Loss: 1.132131, Acc: [0.6875]
step/epoch:60/0,Train Loss: 0.493038, Acc: [0.8125]
step/epoch:70/0,Train Loss: 0.819410, Acc: [0.78125]
step/epoch:80/0,Train Loss: 0.996497, Acc: [0.71875]
step/epoch:90/0,Train Loss: 0.474205, Acc: [0.8125]
step/epoch:100/0,Train Loss: 0.744587, Acc: [0.8125]
step/epoch:110/0,Train Loss: 0.502217, Acc: [0.78125]
step/epoch:120/0,Train Loss: 0.531865, Acc: [0.8125]
step/epoch:130/0,Train Loss: 1.016807, Acc: [0.875]
step/epoch:140/0,Train Loss: 0.411701, Acc: [0.84375]

(2)当weight_decay=10.0时,使用正则化

---------------------------------------------------
step/epoch:0/0,Train Loss: 1563.402832, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1530.002686, Acc: [0.53125]
step/epoch:20/0,Train Loss: 1495.115234, Acc: [0.71875]
step/epoch:30/0,Train Loss: 1461.114136, Acc: [0.78125]
step/epoch:40/0,Train Loss: 1427.868164, Acc: [0.6875]
step/epoch:50/0,Train Loss: 1395.430054, Acc: [0.6875]
step/epoch:60/0,Train Loss: 1363.358154, Acc: [0.5625]
step/epoch:70/0,Train Loss: 1331.439697, Acc: [0.75]
step/epoch:80/0,Train Loss: 1301.334106, Acc: [0.625]
step/epoch:90/0,Train Loss: 1271.505005, Acc: [0.6875]
step/epoch:100/0,Train Loss: 1242.488647, Acc: [0.75]
step/epoch:110/0,Train Loss: 1214.184204, Acc: [0.59375]
step/epoch:120/0,Train Loss: 1186.174561, Acc: [0.71875]
step/epoch:130/0,Train Loss: 1159.148438, Acc: [0.78125]
step/epoch:140/0,Train Loss: 1133.020020, Acc: [0.65625]

(3)当weight_decay=10000.0时,使用正则化

step/epoch:0/0,Train Loss: 1570211.500000, Acc: [0.09375]
step/epoch:10/0,Train Loss: 1522952.125000, Acc: [0.3125]
step/epoch:20/0,Train Loss: 1486256.125000, Acc: [0.125]
step/epoch:30/0,Train Loss: 1451671.500000, Acc: [0.25]
step/epoch:40/0,Train Loss: 1418959.750000, Acc: [0.15625]
step/epoch:50/0,Train Loss: 1387154.000000, Acc: [0.125]
step/epoch:60/0,Train Loss: 1355917.500000, Acc: [0.125]
step/epoch:70/0,Train Loss: 1325379.500000, Acc: [0.125]
step/epoch:80/0,Train Loss: 1295454.125000, Acc: [0.3125]
step/epoch:90/0,Train Loss: 1266115.375000, Acc: [0.15625]
step/epoch:100/0,Train Loss: 1237341.000000, Acc: [0.0625]
step/epoch:110/0,Train Loss: 1209186.500000, Acc: [0.125]
step/epoch:120/0,Train Loss: 1181584.250000, Acc: [0.125]
step/epoch:130/0,Train Loss: 1154600.125000, Acc: [0.1875]
step/epoch:140/0,Train Loss: 1128239.875000, Acc: [0.125]

对比torch.optim优化器的实现L2正则化方法,这种Regularization类的方法也同样达到正则化的效果,并且与TensorFlow类似,loss把正则化的损失也计算了。

此外更改参数p,如当p=0表示L2正则化,p=1表示L1正则化。

4. Github项目源码下载

《Github项目源码》点击进入

以上为个人经验,希望能给大家一个参考,也希望大家多多支持三水点靠木。如有错误或未考虑完全的地方,望不吝赐教。

Python 相关文章推荐
php使用递归与迭代实现快速排序示例
Jan 23 Python
Python的Flask框架中@app.route的用法教程
Mar 31 Python
Python中元组,列表,字典的区别
May 21 Python
利用标准库fractions模块让Python支持分数类型的方法详解
Aug 11 Python
100行python代码实现跳一跳辅助程序
Jan 15 Python
pandas筛选某列出现编码错误的解决方法
Nov 07 Python
Python基于百度云文字识别API
Dec 13 Python
Python魔法方法功能与用法简介
Apr 04 Python
python代理工具mitmproxy使用指南
Jul 04 Python
python中dict()的高级用法实现
Nov 13 Python
python如何支持并发方法详解
Jul 25 Python
python保存大型 .mat 数据文件报错超出 IO 限制的操作
May 10 Python
Pytorch自定义Dataset和DataLoader去除不存在和空数据的操作
Mar 03 #Python
python爬取youtube视频的示例代码
Mar 03 #Python
pytorch Dataset,DataLoader产生自定义的训练数据案例
Mar 03 #Python
解决pytorch 数据类型报错的问题
Mar 03 #Python
python反编译教程之2048小游戏实例
Mar 03 #Python
python 如何读、写、解析CSV文件
Mar 03 #Python
聊聊python在linux下与windows下导入模块的区别说明
Mar 03 #Python
You might like
Laravel 5框架学习之表单验证
2015/04/08 PHP
PHP中的常见魔术方法功能作用及用法实例
2015/07/01 PHP
js 链式延迟执行DOME
2012/01/04 Javascript
如何在父窗口中得知window.open()出的子窗口关闭事件
2013/10/15 Javascript
js实现可控制左右方向的无缝滚动效果
2016/05/29 Javascript
jQuery插件扩展extend的简单实现原理
2016/06/24 Javascript
AnjularJS中$scope和$rootScope的区别小结
2016/09/18 Javascript
详解使用JS如何制作简单的ASCII图与单极图
2017/03/31 Javascript
JavaScript数据结构中串的表示与应用实例
2017/04/12 Javascript
javaScript字符串工具类StringUtils详解
2017/12/08 Javascript
使用Angular CLI进行单元测试和E2E测试的方法
2018/03/24 Javascript
js实现二级菜单点击显示当前内容效果
2018/04/28 Javascript
React中阻止事件冒泡的问题详析
2019/04/12 Javascript
Vue源码解析之数据响应系统的使用
2019/04/24 Javascript
vue发送websocket请求和http post请求的实例代码
2019/07/11 Javascript
原生js基于canvas实现一个简单的前端截图工具代码实例
2019/09/10 Javascript
VUE-ElementUI 自定义Loading图操作
2020/11/11 Javascript
浅谈Python中列表生成式和生成器的区别
2015/08/03 Python
Python机器学习算法之k均值聚类(k-means)
2018/02/23 Python
django ajax json的实例代码
2018/05/29 Python
使用Python实现企业微信的自动打卡功能
2019/04/30 Python
Python绘制股票移动均线的实例
2019/08/24 Python
python中property属性的介绍及其应用详解
2019/08/29 Python
使用Pytorch来拟合函数方式
2020/01/14 Python
Python list运算操作代码实例解析
2020/01/20 Python
解决pip install psycopg2出错问题
2020/07/09 Python
Java基础知识面试题
2014/03/25 面试题
澳大利亚商务邀请函
2014/01/17 职场文书
个人安全生产承诺书
2014/05/22 职场文书
延安红色之旅心得体会
2014/10/07 职场文书
父亲节感言
2015/08/03 职场文书
创业计划书之青年旅馆
2019/09/23 职场文书
pytorch显存一直变大的解决方案
2021/04/08 Python
Golang 空map和未初始化map的注意事项说明
2021/04/29 Golang
如何用vue实现网页截图你知道吗
2021/11/17 Vue.js
漫画《催眠麦克风-Dawn Of Divisions》第二卷PV公开
2022/04/05 日漫