Tensorflow实现将标签变为one-hot形式


Posted in Python onMay 22, 2020

将数据标签变为类似MNIST的one-hot编码形式

def one_hot(indices, 
 depth, 
 on_value=None, 
 off_value=None, 
 axis=None, 
 dtype=None, 
 name=None):
 """Returns a one-hot tensor.
 
 The locations represented by indices in `indices` take value 
 `on_value`,
 while all other locations take value `off_value`.
 
 `on_value` and `off_value` must have matching data types. If 
 `dtype` is also
 provided, they must be the same data type as specified by 
 `dtype`.
 
 If `on_value` is not provided, it will default to the value `1` with 
 type
 `dtype`
 
 If `off_value` is not provided, it will default to the value `0` with 
 type
 `dtype`
 
 If the input `indices` is rank `N`, the output will have rank 
 `N+1`. The
 new axis is created at dimension `axis` (default: the new axis is 
 appended
 at the end).
 
 If `indices` is a scalar the output shape will be a vector of 
 length `depth`
 
 If `indices` is a vector of length `features`, the output shape will 
 be:
 
 ```
 features x depth if axis == -1
 depth x features if axis == 0
 ```
 
 If `indices` is a matrix (batch) with shape `[batch, features]`, the 
 output
 shape will be:
 
 ```
 batch x features x depth if axis == -1
 batch x depth x features if axis == 1
 depth x batch x features if axis == 0
 ```
 
 If `dtype` is not provided, it will attempt to assume the data 
 type of
 `on_value` or `off_value`, if one or both are passed in. If none 
 of
 `on_value`, `off_value`, or `dtype` are provided, `dtype` will 
 default to the
 value `tf.float32`.
 
 Note: If a non-numeric data type output is desired (`tf.string`, 
 `tf.bool`,
 etc.), both `on_value` and `off_value` _must_ be provided to 
 `one_hot`.
 
 For example:
 
 ```python
 indices = [0, 1, 2]
 depth = 3
 tf.one_hot(indices, depth) # output: [3 x 3]
 # [[1., 0., 0.],
 # [0., 1., 0.],
 # [0., 0., 1.]]
 
 indices = [0, 2, -1, 1]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=5.0, off_value=0.0,
 axis=-1) # output: [4 x 3]
 # [[5.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 5.0], # one_hot(2)
 # [0.0, 0.0, 0.0], # one_hot(-1)
 # [0.0, 5.0, 0.0]] # one_hot(1)
 
 indices = [[0, 2], [1, -1]]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=1.0, off_value=0.0,
 axis=-1) # output: [2 x 2 x 3]
 # [[[1.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 1.0]], # one_hot(2)
 # [[0.0, 1.0, 0.0], # one_hot(1)
 # [0.0, 0.0, 0.0]]] # one_hot(-1)
 ```
 
 Args:
 indices: A `Tensor` of indices.
 depth: A scalar defining the depth of the one hot dimension.
 on_value: A scalar defining the value to fill in output when 
 `indices[j]
 = i`. (default: 1)
 off_value: A scalar defining the value to fill in output when 
 `indices[j]
 != i`. (default: 0)
 axis: The axis to fill (default: -1, a new inner-most axis).
 dtype: The data type of the output tensor.
 
 Returns:
 output: The one-hot tensor.
 
 Raises:
 TypeError: If dtype of either `on_value` or `off_value` don't 
 match `dtype`
 TypeError: If dtype of `on_value` and `off_value` don't match 
 one another
 """
 with ops.name_scope(name, "one_hot", 
 [indices, depth, on_value, off_value, axis, 
  dtype]) as name:
 on_exists = on_value is not None
 off_exists = off_value is not None
 on_dtype = ops.convert_to_tensor(on_value).dtype.base_dtype 
  if on_exists else None
 off_dtype = ops.convert_to_tensor(off_value).dtype.
  base_dtype if off_exists else None
 if on_exists or off_exists:
  if dtype is not None:
  # Ensure provided on_value and/or off_value match dtype
  if (on_exists and on_dtype != dtype):
   raise TypeError("dtype {0} of on_value does not match "
   "dtype parameter {1}".format(on_dtype, dtype))
  if (off_exists and off_dtype != dtype):
   raise TypeError("dtype {0} of off_value does not match "
   "dtype parameter {1}".format(off_dtype, dtype))
  else:
  # dtype not provided: automatically assign it
  dtype = on_dtype if on_exists else off_dtype
 elif dtype is None:
  # None of on_value, off_value, or dtype provided. Default 
  dtype to float32
  dtype = dtypes.float32
 if not on_exists:
  # on_value not provided: assign to value 1 of type dtype
  on_value = ops.convert_to_tensor(1, dtype, name="
  on_value")
  on_dtype = dtype
 if not off_exists:
  # off_value not provided: assign to value 0 of type dtype
  off_value = ops.convert_to_tensor(0, dtype, name="
  off_value")
  off_dtype = dtype
 if on_dtype != off_dtype:
  raise TypeError("dtype {0} of on_value does not match "
  "dtype {1} of off_value".format(on_dtype, off_dtype))
 return gen_array_ops._one_hot(indices, depth, on_value, 
  off_value, axis, 
  name)
 
 
Enter: apply completion.
 + Ctrl: remove arguments and replace current word (no Pop-
 up focus).
 + Shift: remove arguments (requires Pop-up focus).
import tensorflow as tf
import numpy as np
data = np.linspace(0,9,10)
label = tf.one_hot(data,10)
with tf.Session() as sess:
 print(data)
 print(sess.run(label))

Tensorflow实现将标签变为one-hot形式

补充知识:数据清洗—制作one-hot

使用pandas进行one-hot编码

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

pandas中get_dummies()函数可以将字段进行编码,转换为01形式,其中prefix可以为每个新展开的列名添加前缀。

但是,笔者发现它较易使用在数据为每一列为单独的字符:

Tensorflow实现将标签变为one-hot形式

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})

## one-hot
df_dumm = pd.get_dummies(df)

Tensorflow实现将标签变为one-hot形式

my_one_hot

但是对于数据为下面形式的可就不能直接转换了,需要先预处理一下,之后转换为one-hot形式:

Tensorflow实现将标签变为one-hot形式

我的做法是:

## tqdm_notebook可以导入tqdm包来使用
def one_hot_my(dataframe, attri):
 sample_attri_list = []
 sample_attri_loc_dic = {}
 loc = 0
 dataframe[attri] = dataframe[attri].astype(str)
 for attri_id in tqdm_notebook(dataframe[attri]):
  attri_id_pro = attri_id.strip().split(',')
  for key in attri_id_pro:
   if key not in sample_attri_loc_dic.keys():
    sample_attri_loc_dic[key] = loc
    loc+=1
  sample_attri_list.append(attri_id_pro)
 print("开始完成one-hot.......")  
 one_hot_attri = []
 for attri_id in tqdm_notebook(sample_attri_list):
  array = [0 for _ in range(len(sample_attri_loc_dic.keys()))]
  for key in attri_id:
   array[sample_attri_loc_dic[key]] = 1
  one_hot_attri.append(array)
 print("封装成dataframe.......") 
 ## 封装成dataframe
 columns = [attri+x for x in sample_attri_loc_dic.keys()]
 one_hot_rig_id_df = pd.DataFrame(one_hot_attri,columns=columns)
 return one_hot_rig_id_df

对属性二值化可以采用:

## 对属性进行二值化
def binary_apply(key, attri, dataframe):
 key_modify = 'is_' + ''.join(lazy_pinyin(key)) + '_' + attri
 print(key_modify)
 dataframe[key_modify] = dataframe.apply(lambda x:1 if x[attri]== key else 0, axis=1)
 return dataframe

对字符进行编码,将字符转换为0,1,2…:

## 对字符进行编码
# columns = ['job', 'marital', 'education','default','housing' ,'loan','contact', 'poutcome']
def encode_info(dataframe, columns):
 for col in columns:
  print(col)
  dataframe[col] = pd.factorize(dataframe[col])[0]
 return dataframe

Tensorflow实现将标签变为one-hot形式

以上这篇Tensorflow实现将标签变为one-hot形式就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
Python入门篇之数字
Oct 20 Python
线程和进程的区别及Python代码实例
Feb 04 Python
CentOS 6.5下安装Python 3.5.2(与Python2并存)
Jun 05 Python
Python实现将数据写入netCDF4中的方法示例
Aug 30 Python
python使用参数对嵌套字典进行取值的方法
Apr 26 Python
Python3的高阶函数map,reduce,filter的示例详解
Jul 23 Python
在Tensorflow中实现梯度下降法更新参数值
Jan 23 Python
Python利用 utf-8-sig 编码格式解决写入 csv 文件乱码问题
Feb 21 Python
Python中的xlrd模块使用原理解析
May 21 Python
如何基于Python爬取隐秘的角落评论
Jul 02 Python
python线程优先级队列知识点总结
Feb 28 Python
Django项目配置Memcached和Redis, 缓存选择哪个更有优势
Apr 06 Python
Python selenium爬取微博数据代码实例
May 22 #Python
python实现文法左递归的消除方法
May 22 #Python
使用Django搭建网站实现商品分页功能
May 22 #Python
Tensorflow卷积实现原理+手写python代码实现卷积教程
May 22 #Python
Python实现发票自动校核微信机器人的方法
May 22 #Python
基于django micro搭建网站实现加水印功能
May 22 #Python
基于Tensorflow一维卷积用法详解
May 22 #Python
You might like
生成卡号php代码
2008/04/09 PHP
php 分页类 扩展代码
2009/06/11 PHP
PHP 杂谈《重构-改善既有代码的设计》之一 重新组织你的函数
2012/04/09 PHP
php自动给文章加关键词链接的函数代码
2012/11/29 PHP
用JQuery模仿淘宝的图片放大镜显示效果
2011/09/15 Javascript
jQuery学习笔记之jQuery动画效果
2013/09/09 Javascript
开发中可能会用到的jQuery小技巧
2014/03/07 Javascript
JavaScript加入收藏夹功能(兼容IE、firefox、chrome)
2014/05/05 Javascript
JS随机调用指定函数的方法
2015/07/01 Javascript
javascript实现框架高度随内容改变的方法
2015/07/23 Javascript
深入学习JavaScript对象
2015/10/13 Javascript
javascript实现简易计算器的代码
2016/05/31 Javascript
js 截取或者替换字符串中的数字实现方法
2016/06/13 Javascript
bootstrap常用组件之头部导航实现代码
2017/04/20 Javascript
vue学习笔记之vue1.0和vue2.0的区别介绍
2017/05/17 Javascript
vue快捷键与基础指令详解
2017/06/01 Javascript
利用Vue.js实现求职在线之职位查询功能
2017/07/03 Javascript
Vue2.0用 watch 观察 prop 变化(不触发)
2017/09/08 Javascript
浅谈Vuex的状态管理(全家桶)
2017/11/04 Javascript
JS复杂判断的更优雅写法代码详解
2018/11/07 Javascript
es6数据变更同步到视图层的方法
2019/03/04 Javascript
详解javascript void(0)
2020/07/13 Javascript
[03:37]2015国际邀请赛第四日现场精彩集锦
2015/08/08 DOTA
Python中文字符串截取问题
2015/06/15 Python
Python 文件管理实例详解
2015/11/10 Python
让python 3支持mysqldb的解决方法
2017/02/14 Python
python接口自动化测试之接口数据依赖的实现方法
2019/04/26 Python
python中的时区问题
2021/01/14 Python
最新自我评价范文
2013/11/16 职场文书
通用求职信范文模板分享
2013/12/27 职场文书
企业新年寄语
2014/04/04 职场文书
家长通知书教师评语
2014/04/17 职场文书
员工保密协议书
2014/09/27 职场文书
社区党务工作总结2015
2015/05/19 职场文书
《别在吃苦的年纪选择安逸》读后感3篇
2019/11/30 职场文书
Vue自定义铃声提示音组件的实现
2022/01/22 Vue.js