Tensorflow实现将标签变为one-hot形式


Posted in Python onMay 22, 2020

将数据标签变为类似MNIST的one-hot编码形式

def one_hot(indices, 
 depth, 
 on_value=None, 
 off_value=None, 
 axis=None, 
 dtype=None, 
 name=None):
 """Returns a one-hot tensor.
 
 The locations represented by indices in `indices` take value 
 `on_value`,
 while all other locations take value `off_value`.
 
 `on_value` and `off_value` must have matching data types. If 
 `dtype` is also
 provided, they must be the same data type as specified by 
 `dtype`.
 
 If `on_value` is not provided, it will default to the value `1` with 
 type
 `dtype`
 
 If `off_value` is not provided, it will default to the value `0` with 
 type
 `dtype`
 
 If the input `indices` is rank `N`, the output will have rank 
 `N+1`. The
 new axis is created at dimension `axis` (default: the new axis is 
 appended
 at the end).
 
 If `indices` is a scalar the output shape will be a vector of 
 length `depth`
 
 If `indices` is a vector of length `features`, the output shape will 
 be:
 
 ```
 features x depth if axis == -1
 depth x features if axis == 0
 ```
 
 If `indices` is a matrix (batch) with shape `[batch, features]`, the 
 output
 shape will be:
 
 ```
 batch x features x depth if axis == -1
 batch x depth x features if axis == 1
 depth x batch x features if axis == 0
 ```
 
 If `dtype` is not provided, it will attempt to assume the data 
 type of
 `on_value` or `off_value`, if one or both are passed in. If none 
 of
 `on_value`, `off_value`, or `dtype` are provided, `dtype` will 
 default to the
 value `tf.float32`.
 
 Note: If a non-numeric data type output is desired (`tf.string`, 
 `tf.bool`,
 etc.), both `on_value` and `off_value` _must_ be provided to 
 `one_hot`.
 
 For example:
 
 ```python
 indices = [0, 1, 2]
 depth = 3
 tf.one_hot(indices, depth) # output: [3 x 3]
 # [[1., 0., 0.],
 # [0., 1., 0.],
 # [0., 0., 1.]]
 
 indices = [0, 2, -1, 1]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=5.0, off_value=0.0,
 axis=-1) # output: [4 x 3]
 # [[5.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 5.0], # one_hot(2)
 # [0.0, 0.0, 0.0], # one_hot(-1)
 # [0.0, 5.0, 0.0]] # one_hot(1)
 
 indices = [[0, 2], [1, -1]]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=1.0, off_value=0.0,
 axis=-1) # output: [2 x 2 x 3]
 # [[[1.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 1.0]], # one_hot(2)
 # [[0.0, 1.0, 0.0], # one_hot(1)
 # [0.0, 0.0, 0.0]]] # one_hot(-1)
 ```
 
 Args:
 indices: A `Tensor` of indices.
 depth: A scalar defining the depth of the one hot dimension.
 on_value: A scalar defining the value to fill in output when 
 `indices[j]
 = i`. (default: 1)
 off_value: A scalar defining the value to fill in output when 
 `indices[j]
 != i`. (default: 0)
 axis: The axis to fill (default: -1, a new inner-most axis).
 dtype: The data type of the output tensor.
 
 Returns:
 output: The one-hot tensor.
 
 Raises:
 TypeError: If dtype of either `on_value` or `off_value` don't 
 match `dtype`
 TypeError: If dtype of `on_value` and `off_value` don't match 
 one another
 """
 with ops.name_scope(name, "one_hot", 
 [indices, depth, on_value, off_value, axis, 
  dtype]) as name:
 on_exists = on_value is not None
 off_exists = off_value is not None
 on_dtype = ops.convert_to_tensor(on_value).dtype.base_dtype 
  if on_exists else None
 off_dtype = ops.convert_to_tensor(off_value).dtype.
  base_dtype if off_exists else None
 if on_exists or off_exists:
  if dtype is not None:
  # Ensure provided on_value and/or off_value match dtype
  if (on_exists and on_dtype != dtype):
   raise TypeError("dtype {0} of on_value does not match "
   "dtype parameter {1}".format(on_dtype, dtype))
  if (off_exists and off_dtype != dtype):
   raise TypeError("dtype {0} of off_value does not match "
   "dtype parameter {1}".format(off_dtype, dtype))
  else:
  # dtype not provided: automatically assign it
  dtype = on_dtype if on_exists else off_dtype
 elif dtype is None:
  # None of on_value, off_value, or dtype provided. Default 
  dtype to float32
  dtype = dtypes.float32
 if not on_exists:
  # on_value not provided: assign to value 1 of type dtype
  on_value = ops.convert_to_tensor(1, dtype, name="
  on_value")
  on_dtype = dtype
 if not off_exists:
  # off_value not provided: assign to value 0 of type dtype
  off_value = ops.convert_to_tensor(0, dtype, name="
  off_value")
  off_dtype = dtype
 if on_dtype != off_dtype:
  raise TypeError("dtype {0} of on_value does not match "
  "dtype {1} of off_value".format(on_dtype, off_dtype))
 return gen_array_ops._one_hot(indices, depth, on_value, 
  off_value, axis, 
  name)
 
 
Enter: apply completion.
 + Ctrl: remove arguments and replace current word (no Pop-
 up focus).
 + Shift: remove arguments (requires Pop-up focus).
import tensorflow as tf
import numpy as np
data = np.linspace(0,9,10)
label = tf.one_hot(data,10)
with tf.Session() as sess:
 print(data)
 print(sess.run(label))

Tensorflow实现将标签变为one-hot形式

补充知识:数据清洗—制作one-hot

使用pandas进行one-hot编码

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

pandas中get_dummies()函数可以将字段进行编码,转换为01形式,其中prefix可以为每个新展开的列名添加前缀。

但是,笔者发现它较易使用在数据为每一列为单独的字符:

Tensorflow实现将标签变为one-hot形式

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})

## one-hot
df_dumm = pd.get_dummies(df)

Tensorflow实现将标签变为one-hot形式

my_one_hot

但是对于数据为下面形式的可就不能直接转换了,需要先预处理一下,之后转换为one-hot形式:

Tensorflow实现将标签变为one-hot形式

我的做法是:

## tqdm_notebook可以导入tqdm包来使用
def one_hot_my(dataframe, attri):
 sample_attri_list = []
 sample_attri_loc_dic = {}
 loc = 0
 dataframe[attri] = dataframe[attri].astype(str)
 for attri_id in tqdm_notebook(dataframe[attri]):
  attri_id_pro = attri_id.strip().split(',')
  for key in attri_id_pro:
   if key not in sample_attri_loc_dic.keys():
    sample_attri_loc_dic[key] = loc
    loc+=1
  sample_attri_list.append(attri_id_pro)
 print("开始完成one-hot.......")  
 one_hot_attri = []
 for attri_id in tqdm_notebook(sample_attri_list):
  array = [0 for _ in range(len(sample_attri_loc_dic.keys()))]
  for key in attri_id:
   array[sample_attri_loc_dic[key]] = 1
  one_hot_attri.append(array)
 print("封装成dataframe.......") 
 ## 封装成dataframe
 columns = [attri+x for x in sample_attri_loc_dic.keys()]
 one_hot_rig_id_df = pd.DataFrame(one_hot_attri,columns=columns)
 return one_hot_rig_id_df

对属性二值化可以采用:

## 对属性进行二值化
def binary_apply(key, attri, dataframe):
 key_modify = 'is_' + ''.join(lazy_pinyin(key)) + '_' + attri
 print(key_modify)
 dataframe[key_modify] = dataframe.apply(lambda x:1 if x[attri]== key else 0, axis=1)
 return dataframe

对字符进行编码,将字符转换为0,1,2…:

## 对字符进行编码
# columns = ['job', 'marital', 'education','default','housing' ,'loan','contact', 'poutcome']
def encode_info(dataframe, columns):
 for col in columns:
  print(col)
  dataframe[col] = pd.factorize(dataframe[col])[0]
 return dataframe

Tensorflow实现将标签变为one-hot形式

以上这篇Tensorflow实现将标签变为one-hot形式就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
Python MD5文件生成码
Jan 12 Python
python 实现堆排序算法代码
Jun 05 Python
python调用java的Webservice示例
Mar 10 Python
python的描述符(descriptor)、装饰器(property)造成的一个无限递归问题分享
Jul 09 Python
python判断字符串是否包含子字符串的方法
Mar 24 Python
python下调用pytesseract识别某网站验证码的实现方法
Jun 06 Python
详解Python中contextlib上下文管理模块的用法
Jun 28 Python
selenium在执行phantomjs的API并获取执行结果的方法
Dec 17 Python
了解不常见但是实用的Python技巧
May 23 Python
pygame实现打字游戏
Feb 19 Python
tensorflow保持每次训练结果一致的简单实现
Feb 17 Python
python+selenium 脚本实现每天自动登记的思路详解
Mar 11 Python
Python selenium爬取微博数据代码实例
May 22 #Python
python实现文法左递归的消除方法
May 22 #Python
使用Django搭建网站实现商品分页功能
May 22 #Python
Tensorflow卷积实现原理+手写python代码实现卷积教程
May 22 #Python
Python实现发票自动校核微信机器人的方法
May 22 #Python
基于django micro搭建网站实现加水印功能
May 22 #Python
基于Tensorflow一维卷积用法详解
May 22 #Python
You might like
php简单实现单态设计模式的方法分析
2017/07/28 PHP
laravel 多图上传及图片的存储例子
2019/10/14 PHP
javascript网页关键字高亮代码
2008/07/30 Javascript
JavaScript效率调优经验
2009/06/04 Javascript
20款超赞的jQuery插件 Web开发人员必备
2011/02/26 Javascript
Java 正则表达式学习总结和一些小例子
2012/09/13 Javascript
js精度溢出解决方案
2012/12/02 Javascript
cookie 最近浏览记录(中文escape转码)具体实现
2013/06/08 Javascript
javascript scrollTop正解使用方法
2013/11/14 Javascript
使用javascript实现有效时间的控制,并显示将要过期的时间
2014/01/02 Javascript
jquery根据属性和index来查找属性值并操作
2014/07/25 Javascript
详解Wondows下Node.js使用MongoDB的环境配置
2016/03/01 Javascript
jQueryMobile之窗体长内容的缺陷与解决方法实例分析
2017/09/20 jQuery
JS与jQuery判断文本框还剩多少字符可以输入的方法
2018/09/01 jQuery
详解小程序云开发数据库
2019/05/20 Javascript
node+vue实现文件上传功能
2020/05/28 Javascript
[00:11]战神迅矛
2019/03/06 DOTA
Python中的tuple元组详细介绍
2015/02/02 Python
python创建列表并给列表赋初始值的方法
2015/07/28 Python
Python二叉搜索树与双向链表转换实现方法
2016/04/29 Python
python中文分词教程之前向最大正向匹配算法详解
2017/11/02 Python
Python实现操纵控制windows注册表的方法分析
2019/05/24 Python
python 类的继承 实例方法.静态方法.类方法的代码解析
2019/08/23 Python
python实现在内存中读写str和二进制数据代码
2020/04/24 Python
python将字典内容写入json文件的实例代码
2020/08/12 Python
如何在vscode中安装python库的方法步骤
2021/01/06 Python
Python爬取你好李焕英豆瓣短评生成词云的示例代码
2021/02/24 Python
汽车专业毕业生自荐信
2013/11/03 职场文书
银行简历自我评价
2014/02/11 职场文书
农民工工资承诺书范文
2014/03/31 职场文书
民政局办理协议离婚(范本)
2014/10/25 职场文书
2014年幼儿园个人工作总结
2014/11/10 职场文书
工作后的感想
2015/08/07 职场文书
mysql自增长id用完了该怎么办
2022/02/12 MySQL
电频谱管理的原则是什么
2022/02/18 无线电
mysql使用 not int 子查询隐含陷阱
2022/04/12 MySQL