Tensorflow实现将标签变为one-hot形式


Posted in Python onMay 22, 2020

将数据标签变为类似MNIST的one-hot编码形式

def one_hot(indices, 
 depth, 
 on_value=None, 
 off_value=None, 
 axis=None, 
 dtype=None, 
 name=None):
 """Returns a one-hot tensor.
 
 The locations represented by indices in `indices` take value 
 `on_value`,
 while all other locations take value `off_value`.
 
 `on_value` and `off_value` must have matching data types. If 
 `dtype` is also
 provided, they must be the same data type as specified by 
 `dtype`.
 
 If `on_value` is not provided, it will default to the value `1` with 
 type
 `dtype`
 
 If `off_value` is not provided, it will default to the value `0` with 
 type
 `dtype`
 
 If the input `indices` is rank `N`, the output will have rank 
 `N+1`. The
 new axis is created at dimension `axis` (default: the new axis is 
 appended
 at the end).
 
 If `indices` is a scalar the output shape will be a vector of 
 length `depth`
 
 If `indices` is a vector of length `features`, the output shape will 
 be:
 
 ```
 features x depth if axis == -1
 depth x features if axis == 0
 ```
 
 If `indices` is a matrix (batch) with shape `[batch, features]`, the 
 output
 shape will be:
 
 ```
 batch x features x depth if axis == -1
 batch x depth x features if axis == 1
 depth x batch x features if axis == 0
 ```
 
 If `dtype` is not provided, it will attempt to assume the data 
 type of
 `on_value` or `off_value`, if one or both are passed in. If none 
 of
 `on_value`, `off_value`, or `dtype` are provided, `dtype` will 
 default to the
 value `tf.float32`.
 
 Note: If a non-numeric data type output is desired (`tf.string`, 
 `tf.bool`,
 etc.), both `on_value` and `off_value` _must_ be provided to 
 `one_hot`.
 
 For example:
 
 ```python
 indices = [0, 1, 2]
 depth = 3
 tf.one_hot(indices, depth) # output: [3 x 3]
 # [[1., 0., 0.],
 # [0., 1., 0.],
 # [0., 0., 1.]]
 
 indices = [0, 2, -1, 1]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=5.0, off_value=0.0,
 axis=-1) # output: [4 x 3]
 # [[5.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 5.0], # one_hot(2)
 # [0.0, 0.0, 0.0], # one_hot(-1)
 # [0.0, 5.0, 0.0]] # one_hot(1)
 
 indices = [[0, 2], [1, -1]]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=1.0, off_value=0.0,
 axis=-1) # output: [2 x 2 x 3]
 # [[[1.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 1.0]], # one_hot(2)
 # [[0.0, 1.0, 0.0], # one_hot(1)
 # [0.0, 0.0, 0.0]]] # one_hot(-1)
 ```
 
 Args:
 indices: A `Tensor` of indices.
 depth: A scalar defining the depth of the one hot dimension.
 on_value: A scalar defining the value to fill in output when 
 `indices[j]
 = i`. (default: 1)
 off_value: A scalar defining the value to fill in output when 
 `indices[j]
 != i`. (default: 0)
 axis: The axis to fill (default: -1, a new inner-most axis).
 dtype: The data type of the output tensor.
 
 Returns:
 output: The one-hot tensor.
 
 Raises:
 TypeError: If dtype of either `on_value` or `off_value` don't 
 match `dtype`
 TypeError: If dtype of `on_value` and `off_value` don't match 
 one another
 """
 with ops.name_scope(name, "one_hot", 
 [indices, depth, on_value, off_value, axis, 
  dtype]) as name:
 on_exists = on_value is not None
 off_exists = off_value is not None
 on_dtype = ops.convert_to_tensor(on_value).dtype.base_dtype 
  if on_exists else None
 off_dtype = ops.convert_to_tensor(off_value).dtype.
  base_dtype if off_exists else None
 if on_exists or off_exists:
  if dtype is not None:
  # Ensure provided on_value and/or off_value match dtype
  if (on_exists and on_dtype != dtype):
   raise TypeError("dtype {0} of on_value does not match "
   "dtype parameter {1}".format(on_dtype, dtype))
  if (off_exists and off_dtype != dtype):
   raise TypeError("dtype {0} of off_value does not match "
   "dtype parameter {1}".format(off_dtype, dtype))
  else:
  # dtype not provided: automatically assign it
  dtype = on_dtype if on_exists else off_dtype
 elif dtype is None:
  # None of on_value, off_value, or dtype provided. Default 
  dtype to float32
  dtype = dtypes.float32
 if not on_exists:
  # on_value not provided: assign to value 1 of type dtype
  on_value = ops.convert_to_tensor(1, dtype, name="
  on_value")
  on_dtype = dtype
 if not off_exists:
  # off_value not provided: assign to value 0 of type dtype
  off_value = ops.convert_to_tensor(0, dtype, name="
  off_value")
  off_dtype = dtype
 if on_dtype != off_dtype:
  raise TypeError("dtype {0} of on_value does not match "
  "dtype {1} of off_value".format(on_dtype, off_dtype))
 return gen_array_ops._one_hot(indices, depth, on_value, 
  off_value, axis, 
  name)
 
 
Enter: apply completion.
 + Ctrl: remove arguments and replace current word (no Pop-
 up focus).
 + Shift: remove arguments (requires Pop-up focus).
import tensorflow as tf
import numpy as np
data = np.linspace(0,9,10)
label = tf.one_hot(data,10)
with tf.Session() as sess:
 print(data)
 print(sess.run(label))

Tensorflow实现将标签变为one-hot形式

补充知识:数据清洗—制作one-hot

使用pandas进行one-hot编码

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

pandas中get_dummies()函数可以将字段进行编码,转换为01形式,其中prefix可以为每个新展开的列名添加前缀。

但是,笔者发现它较易使用在数据为每一列为单独的字符:

Tensorflow实现将标签变为one-hot形式

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})

## one-hot
df_dumm = pd.get_dummies(df)

Tensorflow实现将标签变为one-hot形式

my_one_hot

但是对于数据为下面形式的可就不能直接转换了,需要先预处理一下,之后转换为one-hot形式:

Tensorflow实现将标签变为one-hot形式

我的做法是:

## tqdm_notebook可以导入tqdm包来使用
def one_hot_my(dataframe, attri):
 sample_attri_list = []
 sample_attri_loc_dic = {}
 loc = 0
 dataframe[attri] = dataframe[attri].astype(str)
 for attri_id in tqdm_notebook(dataframe[attri]):
  attri_id_pro = attri_id.strip().split(',')
  for key in attri_id_pro:
   if key not in sample_attri_loc_dic.keys():
    sample_attri_loc_dic[key] = loc
    loc+=1
  sample_attri_list.append(attri_id_pro)
 print("开始完成one-hot.......")  
 one_hot_attri = []
 for attri_id in tqdm_notebook(sample_attri_list):
  array = [0 for _ in range(len(sample_attri_loc_dic.keys()))]
  for key in attri_id:
   array[sample_attri_loc_dic[key]] = 1
  one_hot_attri.append(array)
 print("封装成dataframe.......") 
 ## 封装成dataframe
 columns = [attri+x for x in sample_attri_loc_dic.keys()]
 one_hot_rig_id_df = pd.DataFrame(one_hot_attri,columns=columns)
 return one_hot_rig_id_df

对属性二值化可以采用:

## 对属性进行二值化
def binary_apply(key, attri, dataframe):
 key_modify = 'is_' + ''.join(lazy_pinyin(key)) + '_' + attri
 print(key_modify)
 dataframe[key_modify] = dataframe.apply(lambda x:1 if x[attri]== key else 0, axis=1)
 return dataframe

对字符进行编码,将字符转换为0,1,2…:

## 对字符进行编码
# columns = ['job', 'marital', 'education','default','housing' ,'loan','contact', 'poutcome']
def encode_info(dataframe, columns):
 for col in columns:
  print(col)
  dataframe[col] = pd.factorize(dataframe[col])[0]
 return dataframe

Tensorflow实现将标签变为one-hot形式

以上这篇Tensorflow实现将标签变为one-hot形式就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python控制台英汉汉英电子词典
Apr 23 Python
Python深入学习之上下文管理器
Aug 31 Python
TensorFlow实现MLP多层感知机模型
Mar 09 Python
PyQt5每天必学之拖放事件
Aug 27 Python
Python unittest模块用法实例分析
May 25 Python
在Python中居然可以定义两个同名通参数的函数
Jan 31 Python
Python实现定期检查源目录与备份目录的差异并进行备份功能示例
Feb 27 Python
python 发送json数据操作实例分析
Oct 15 Python
Python实现FLV视频拼接功能
Jan 21 Python
Python实现括号匹配方法详解
Feb 10 Python
使用Python实现NBA球员数据查询小程序功能
Nov 09 Python
python中的被动信息搜集
Apr 29 Python
Python selenium爬取微博数据代码实例
May 22 #Python
python实现文法左递归的消除方法
May 22 #Python
使用Django搭建网站实现商品分页功能
May 22 #Python
Tensorflow卷积实现原理+手写python代码实现卷积教程
May 22 #Python
Python实现发票自动校核微信机器人的方法
May 22 #Python
基于django micro搭建网站实现加水印功能
May 22 #Python
基于Tensorflow一维卷积用法详解
May 22 #Python
You might like
Codeigniter中mkdir创建目录遇到权限问题和解决方法
2014/07/25 PHP
php array_pop 删除数组最后一个元素实例
2016/11/02 PHP
Laravel 5.5 异常处理 & 错误日志的解决
2019/10/17 PHP
Javascript模块化编程(一)AMD规范(规范使用模块)
2013/01/17 Javascript
如何将一个String和多个String值进行比较思路分析
2013/04/22 Javascript
jQuery使用fadeout实现元素渐隐效果的方法
2015/03/27 Javascript
使用AmplifyJS组件配合JavaScript进行编程的指南
2015/07/28 Javascript
jquery取消事件冒泡的三种方法(推荐)
2016/05/28 Javascript
jquery加载单文件vue组件的方法
2017/06/20 jQuery
Angularjs渲染的 using 指令的星级评分系统示例
2017/11/09 Javascript
jquery写出PC端轮播图实例
2018/01/26 jQuery
Vue 中使用 CSS Modules优雅方法
2018/04/09 Javascript
vue axios请求频繁时取消上一次请求的方法
2018/11/10 Javascript
小程序红包雨的实现示例
2019/02/19 Javascript
详解基于React.js和Node.js的SSR实现方案
2019/03/21 Javascript
Vue中computed及watch区别实例解析
2020/08/01 Javascript
Python 安装setuptools和pip工具操作方法(必看)
2017/05/22 Python
python+splinter实现12306网站刷票并自动购票流程
2018/09/25 Python
python 利用for循环 保存多个图像或者文件的实例
2018/11/09 Python
Python如何将图像音视频等资源文件隐藏在代码中(小技巧)
2020/02/16 Python
python调用私有属性的方法总结
2020/07/24 Python
用Python制作mini翻译器的实现示例
2020/08/17 Python
Python 删除List元素的三种方法remove、pop、del
2020/11/16 Python
50个强大璀璨的CSS3/JS技术运用实例
2010/02/27 HTML / CSS
css3使网页、图片变成灰色兼容大多数浏览器
2014/07/02 HTML / CSS
美国五金商店:Ace Hardware
2018/03/27 全球购物
历史学专业推荐信
2013/11/06 职场文书
爱耳日宣传活动总结
2014/07/05 职场文书
会议欢迎词
2015/01/23 职场文书
社区法制宣传日活动总结
2015/05/05 职场文书
银行资信证明
2015/06/17 职场文书
2015年新农村建设指导员工作总结
2015/07/24 职场文书
毕业生自我鉴定范文
2019/05/13 职场文书
nginx配置文件使用环境变量的操作方法
2021/06/02 Servers
使用Redis实现实时排行榜功能
2021/07/02 Redis
WebRTC记录音视频流(web技术分享)
2022/02/24 Javascript