Tensorflow实现将标签变为one-hot形式


Posted in Python onMay 22, 2020

将数据标签变为类似MNIST的one-hot编码形式

def one_hot(indices, 
 depth, 
 on_value=None, 
 off_value=None, 
 axis=None, 
 dtype=None, 
 name=None):
 """Returns a one-hot tensor.
 
 The locations represented by indices in `indices` take value 
 `on_value`,
 while all other locations take value `off_value`.
 
 `on_value` and `off_value` must have matching data types. If 
 `dtype` is also
 provided, they must be the same data type as specified by 
 `dtype`.
 
 If `on_value` is not provided, it will default to the value `1` with 
 type
 `dtype`
 
 If `off_value` is not provided, it will default to the value `0` with 
 type
 `dtype`
 
 If the input `indices` is rank `N`, the output will have rank 
 `N+1`. The
 new axis is created at dimension `axis` (default: the new axis is 
 appended
 at the end).
 
 If `indices` is a scalar the output shape will be a vector of 
 length `depth`
 
 If `indices` is a vector of length `features`, the output shape will 
 be:
 
 ```
 features x depth if axis == -1
 depth x features if axis == 0
 ```
 
 If `indices` is a matrix (batch) with shape `[batch, features]`, the 
 output
 shape will be:
 
 ```
 batch x features x depth if axis == -1
 batch x depth x features if axis == 1
 depth x batch x features if axis == 0
 ```
 
 If `dtype` is not provided, it will attempt to assume the data 
 type of
 `on_value` or `off_value`, if one or both are passed in. If none 
 of
 `on_value`, `off_value`, or `dtype` are provided, `dtype` will 
 default to the
 value `tf.float32`.
 
 Note: If a non-numeric data type output is desired (`tf.string`, 
 `tf.bool`,
 etc.), both `on_value` and `off_value` _must_ be provided to 
 `one_hot`.
 
 For example:
 
 ```python
 indices = [0, 1, 2]
 depth = 3
 tf.one_hot(indices, depth) # output: [3 x 3]
 # [[1., 0., 0.],
 # [0., 1., 0.],
 # [0., 0., 1.]]
 
 indices = [0, 2, -1, 1]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=5.0, off_value=0.0,
 axis=-1) # output: [4 x 3]
 # [[5.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 5.0], # one_hot(2)
 # [0.0, 0.0, 0.0], # one_hot(-1)
 # [0.0, 5.0, 0.0]] # one_hot(1)
 
 indices = [[0, 2], [1, -1]]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=1.0, off_value=0.0,
 axis=-1) # output: [2 x 2 x 3]
 # [[[1.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 1.0]], # one_hot(2)
 # [[0.0, 1.0, 0.0], # one_hot(1)
 # [0.0, 0.0, 0.0]]] # one_hot(-1)
 ```
 
 Args:
 indices: A `Tensor` of indices.
 depth: A scalar defining the depth of the one hot dimension.
 on_value: A scalar defining the value to fill in output when 
 `indices[j]
 = i`. (default: 1)
 off_value: A scalar defining the value to fill in output when 
 `indices[j]
 != i`. (default: 0)
 axis: The axis to fill (default: -1, a new inner-most axis).
 dtype: The data type of the output tensor.
 
 Returns:
 output: The one-hot tensor.
 
 Raises:
 TypeError: If dtype of either `on_value` or `off_value` don't 
 match `dtype`
 TypeError: If dtype of `on_value` and `off_value` don't match 
 one another
 """
 with ops.name_scope(name, "one_hot", 
 [indices, depth, on_value, off_value, axis, 
  dtype]) as name:
 on_exists = on_value is not None
 off_exists = off_value is not None
 on_dtype = ops.convert_to_tensor(on_value).dtype.base_dtype 
  if on_exists else None
 off_dtype = ops.convert_to_tensor(off_value).dtype.
  base_dtype if off_exists else None
 if on_exists or off_exists:
  if dtype is not None:
  # Ensure provided on_value and/or off_value match dtype
  if (on_exists and on_dtype != dtype):
   raise TypeError("dtype {0} of on_value does not match "
   "dtype parameter {1}".format(on_dtype, dtype))
  if (off_exists and off_dtype != dtype):
   raise TypeError("dtype {0} of off_value does not match "
   "dtype parameter {1}".format(off_dtype, dtype))
  else:
  # dtype not provided: automatically assign it
  dtype = on_dtype if on_exists else off_dtype
 elif dtype is None:
  # None of on_value, off_value, or dtype provided. Default 
  dtype to float32
  dtype = dtypes.float32
 if not on_exists:
  # on_value not provided: assign to value 1 of type dtype
  on_value = ops.convert_to_tensor(1, dtype, name="
  on_value")
  on_dtype = dtype
 if not off_exists:
  # off_value not provided: assign to value 0 of type dtype
  off_value = ops.convert_to_tensor(0, dtype, name="
  off_value")
  off_dtype = dtype
 if on_dtype != off_dtype:
  raise TypeError("dtype {0} of on_value does not match "
  "dtype {1} of off_value".format(on_dtype, off_dtype))
 return gen_array_ops._one_hot(indices, depth, on_value, 
  off_value, axis, 
  name)
 
 
Enter: apply completion.
 + Ctrl: remove arguments and replace current word (no Pop-
 up focus).
 + Shift: remove arguments (requires Pop-up focus).
import tensorflow as tf
import numpy as np
data = np.linspace(0,9,10)
label = tf.one_hot(data,10)
with tf.Session() as sess:
 print(data)
 print(sess.run(label))

Tensorflow实现将标签变为one-hot形式

补充知识:数据清洗—制作one-hot

使用pandas进行one-hot编码

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

pandas中get_dummies()函数可以将字段进行编码,转换为01形式,其中prefix可以为每个新展开的列名添加前缀。

但是,笔者发现它较易使用在数据为每一列为单独的字符:

Tensorflow实现将标签变为one-hot形式

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})

## one-hot
df_dumm = pd.get_dummies(df)

Tensorflow实现将标签变为one-hot形式

my_one_hot

但是对于数据为下面形式的可就不能直接转换了,需要先预处理一下,之后转换为one-hot形式:

Tensorflow实现将标签变为one-hot形式

我的做法是:

## tqdm_notebook可以导入tqdm包来使用
def one_hot_my(dataframe, attri):
 sample_attri_list = []
 sample_attri_loc_dic = {}
 loc = 0
 dataframe[attri] = dataframe[attri].astype(str)
 for attri_id in tqdm_notebook(dataframe[attri]):
  attri_id_pro = attri_id.strip().split(',')
  for key in attri_id_pro:
   if key not in sample_attri_loc_dic.keys():
    sample_attri_loc_dic[key] = loc
    loc+=1
  sample_attri_list.append(attri_id_pro)
 print("开始完成one-hot.......")  
 one_hot_attri = []
 for attri_id in tqdm_notebook(sample_attri_list):
  array = [0 for _ in range(len(sample_attri_loc_dic.keys()))]
  for key in attri_id:
   array[sample_attri_loc_dic[key]] = 1
  one_hot_attri.append(array)
 print("封装成dataframe.......") 
 ## 封装成dataframe
 columns = [attri+x for x in sample_attri_loc_dic.keys()]
 one_hot_rig_id_df = pd.DataFrame(one_hot_attri,columns=columns)
 return one_hot_rig_id_df

对属性二值化可以采用:

## 对属性进行二值化
def binary_apply(key, attri, dataframe):
 key_modify = 'is_' + ''.join(lazy_pinyin(key)) + '_' + attri
 print(key_modify)
 dataframe[key_modify] = dataframe.apply(lambda x:1 if x[attri]== key else 0, axis=1)
 return dataframe

对字符进行编码,将字符转换为0,1,2…:

## 对字符进行编码
# columns = ['job', 'marital', 'education','default','housing' ,'loan','contact', 'poutcome']
def encode_info(dataframe, columns):
 for col in columns:
  print(col)
  dataframe[col] = pd.factorize(dataframe[col])[0]
 return dataframe

Tensorflow实现将标签变为one-hot形式

以上这篇Tensorflow实现将标签变为one-hot形式就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python 将字符串转换成字典dict
Mar 24 Python
一个简单的python程序实例(通讯录)
Nov 29 Python
对Python中的@classmethod用法详解
Apr 21 Python
Python3.6连接Oracle数据库的方法详解
May 18 Python
Python魔法方法功能与用法简介
Apr 04 Python
flask框架路由常用定义方式总结
Jul 23 Python
微信小程序python用户认证的实现
Jul 29 Python
vscode配置anaconda3的方法步骤
Aug 08 Python
python 列表推导和生成器表达式的使用
Feb 01 Python
pytorch 中forward 的用法与解释说明
Feb 26 Python
python Protobuf定义消息类型知识点讲解
Mar 02 Python
如何利用Python实现n*n螺旋矩阵
Jan 18 Python
Python selenium爬取微博数据代码实例
May 22 #Python
python实现文法左递归的消除方法
May 22 #Python
使用Django搭建网站实现商品分页功能
May 22 #Python
Tensorflow卷积实现原理+手写python代码实现卷积教程
May 22 #Python
Python实现发票自动校核微信机器人的方法
May 22 #Python
基于django micro搭建网站实现加水印功能
May 22 #Python
基于Tensorflow一维卷积用法详解
May 22 #Python
You might like
Php做的端口嗅探器--可以指定网站和端口
2006/10/09 PHP
dedecms中显示数字验证码的修改方法
2007/03/21 PHP
php实现rc4加密算法代码
2012/04/25 PHP
thinkphp普通查询与表达式查询实例分析
2014/11/24 PHP
PHP+MySQL之Insert Into数据插入用法分析
2015/09/27 PHP
修复ShopNC使用QQ 互联时提示100010 错误
2015/11/08 PHP
php提高脚本性能的4个技巧
2020/08/18 PHP
prototype.js的Ajax对象
2006/09/23 Javascript
javascript标签在页面中的位置探讨
2013/04/11 Javascript
JS获取图片实际宽高及根据图片大小进行自适应
2013/08/11 Javascript
JS实现自动变换的菜单效果代码
2015/09/09 Javascript
canvas绘制多边形
2017/02/24 Javascript
vue.js从安装到搭建过程详解
2017/03/17 Javascript
angular.js实现列表orderby排序的方法
2018/10/02 Javascript
tracking.js页面人脸识别插件使用方法
2020/04/16 Javascript
Node.js之删除文件夹(含递归删除)代码实例
2019/09/09 Javascript
用JS实现选项卡
2020/03/23 Javascript
Jquery滑动门/tab切换实现方法完整示例
2020/06/05 jQuery
vue结合el-upload实现腾讯云视频上传功能
2020/07/01 Javascript
Python入门篇之正则表达式
2014/10/20 Python
python中for语句简单遍历数据的方法
2015/05/07 Python
Python实现读取文件最后n行的方法
2017/02/23 Python
python用户管理系统
2018/03/13 Python
python ddt实现数据驱动
2018/03/14 Python
python numpy和list查询其中某个数的个数及定位方法
2018/06/27 Python
简单了解python字符串前面加r,u的含义
2019/12/26 Python
阿迪达斯比利时官方商城:adidas比利时
2016/10/10 全球购物
视图的作用
2014/12/19 面试题
护理专科毕业生自荐书范文
2014/02/19 职场文书
高中军训感想800字
2014/02/23 职场文书
中学生评语大全
2014/04/18 职场文书
兽医医药专业求职信
2014/07/27 职场文书
上班离岗检讨书
2014/09/10 职场文书
社区节水倡议书
2015/04/29 职场文书
2016年第104个国际护士节活动总结
2016/04/06 职场文书
Java实现房屋出租系统详解
2021/10/05 Java/Android