编程 Python

Tensorflow实现将标签变为one-hot形式

Posted in Python onMay 22, 2020

将数据标签变为类似MNIST的one-hot编码形式

def one_hot(indices, 
 depth, 
 on_value=None, 
 off_value=None, 
 axis=None, 
 dtype=None, 
 name=None):
 """Returns a one-hot tensor.
 
 The locations represented by indices in `indices` take value 
 `on_value`,
 while all other locations take value `off_value`.
 
 `on_value` and `off_value` must have matching data types. If 
 `dtype` is also
 provided, they must be the same data type as specified by 
 `dtype`.
 
 If `on_value` is not provided, it will default to the value `1` with 
 type
 `dtype`
 
 If `off_value` is not provided, it will default to the value `0` with 
 type
 `dtype`
 
 If the input `indices` is rank `N`, the output will have rank 
 `N+1`. The
 new axis is created at dimension `axis` (default: the new axis is 
 appended
 at the end).
 
 If `indices` is a scalar the output shape will be a vector of 
 length `depth`
 
 If `indices` is a vector of length `features`, the output shape will 
 be:
 
 ```
 features x depth if axis == -1
 depth x features if axis == 0
 ```
 
 If `indices` is a matrix (batch) with shape `[batch, features]`, the 
 output
 shape will be:
 
 ```
 batch x features x depth if axis == -1
 batch x depth x features if axis == 1
 depth x batch x features if axis == 0
 ```
 
 If `dtype` is not provided, it will attempt to assume the data 
 type of
 `on_value` or `off_value`, if one or both are passed in. If none 
 of
 `on_value`, `off_value`, or `dtype` are provided, `dtype` will 
 default to the
 value `tf.float32`.
 
 Note: If a non-numeric data type output is desired (`tf.string`, 
 `tf.bool`,
 etc.), both `on_value` and `off_value` _must_ be provided to 
 `one_hot`.
 
 For example:
 
 ```python
 indices = [0, 1, 2]
 depth = 3
 tf.one_hot(indices, depth) # output: [3 x 3]
 # [[1., 0., 0.],
 # [0., 1., 0.],
 # [0., 0., 1.]]
 
 indices = [0, 2, -1, 1]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=5.0, off_value=0.0,
 axis=-1) # output: [4 x 3]
 # [[5.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 5.0], # one_hot(2)
 # [0.0, 0.0, 0.0], # one_hot(-1)
 # [0.0, 5.0, 0.0]] # one_hot(1)
 
 indices = [[0, 2], [1, -1]]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=1.0, off_value=0.0,
 axis=-1) # output: [2 x 2 x 3]
 # [[[1.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 1.0]], # one_hot(2)
 # [[0.0, 1.0, 0.0], # one_hot(1)
 # [0.0, 0.0, 0.0]]] # one_hot(-1)
 ```
 
 Args:
 indices: A `Tensor` of indices.
 depth: A scalar defining the depth of the one hot dimension.
 on_value: A scalar defining the value to fill in output when 
 `indices[j]
 = i`. (default: 1)
 off_value: A scalar defining the value to fill in output when 
 `indices[j]
 != i`. (default: 0)
 axis: The axis to fill (default: -1, a new inner-most axis).
 dtype: The data type of the output tensor.
 
 Returns:
 output: The one-hot tensor.
 
 Raises:
 TypeError: If dtype of either `on_value` or `off_value` don't 
 match `dtype`
 TypeError: If dtype of `on_value` and `off_value` don't match 
 one another
 """
 with ops.name_scope(name, "one_hot", 
 [indices, depth, on_value, off_value, axis, 
  dtype]) as name:
 on_exists = on_value is not None
 off_exists = off_value is not None
 on_dtype = ops.convert_to_tensor(on_value).dtype.base_dtype 
  if on_exists else None
 off_dtype = ops.convert_to_tensor(off_value).dtype.
  base_dtype if off_exists else None
 if on_exists or off_exists:
  if dtype is not None:
  # Ensure provided on_value and/or off_value match dtype
  if (on_exists and on_dtype != dtype):
   raise TypeError("dtype {0} of on_value does not match "
   "dtype parameter {1}".format(on_dtype, dtype))
  if (off_exists and off_dtype != dtype):
   raise TypeError("dtype {0} of off_value does not match "
   "dtype parameter {1}".format(off_dtype, dtype))
  else:
  # dtype not provided: automatically assign it
  dtype = on_dtype if on_exists else off_dtype
 elif dtype is None:
  # None of on_value, off_value, or dtype provided. Default 
  dtype to float32
  dtype = dtypes.float32
 if not on_exists:
  # on_value not provided: assign to value 1 of type dtype
  on_value = ops.convert_to_tensor(1, dtype, name="
  on_value")
  on_dtype = dtype
 if not off_exists:
  # off_value not provided: assign to value 0 of type dtype
  off_value = ops.convert_to_tensor(0, dtype, name="
  off_value")
  off_dtype = dtype
 if on_dtype != off_dtype:
  raise TypeError("dtype {0} of on_value does not match "
  "dtype {1} of off_value".format(on_dtype, off_dtype))
 return gen_array_ops._one_hot(indices, depth, on_value, 
  off_value, axis, 
  name)
 
 
Enter: apply completion.
 + Ctrl: remove arguments and replace current word (no Pop-
 up focus).
 + Shift: remove arguments (requires Pop-up focus).

import tensorflow as tf
import numpy as np
data = np.linspace(0,9,10)
label = tf.one_hot(data,10)
with tf.Session() as sess:
 print(data)
 print(sess.run(label))

补充知识：数据清洗—制作one-hot

使用pandas进行one-hot编码

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

pandas中get_dummies()函数可以将字段进行编码，转换为01形式，其中prefix可以为每个新展开的列名添加前缀。

但是，笔者发现它较易使用在数据为每一列为单独的字符：

Tensorflow实现将标签变为one-hot形式

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})

## one-hot
df_dumm = pd.get_dummies(df)

Tensorflow实现将标签变为one-hot形式

my_one_hot

但是对于数据为下面形式的可就不能直接转换了，需要先预处理一下，之后转换为one-hot形式：

Tensorflow实现将标签变为one-hot形式

我的做法是:

## tqdm_notebook可以导入tqdm包来使用
def one_hot_my(dataframe, attri):
 sample_attri_list = []
 sample_attri_loc_dic = {}
 loc = 0
 dataframe[attri] = dataframe[attri].astype(str)
 for attri_id in tqdm_notebook(dataframe[attri]):
  attri_id_pro = attri_id.strip().split(',')
  for key in attri_id_pro:
   if key not in sample_attri_loc_dic.keys():
    sample_attri_loc_dic[key] = loc
    loc+=1
  sample_attri_list.append(attri_id_pro)
 print("开始完成one-hot.......")  
 one_hot_attri = []
 for attri_id in tqdm_notebook(sample_attri_list):
  array = [0 for _ in range(len(sample_attri_loc_dic.keys()))]
  for key in attri_id:
   array[sample_attri_loc_dic[key]] = 1
  one_hot_attri.append(array)
 print("封装成dataframe.......") 
 ## 封装成dataframe
 columns = [attri+x for x in sample_attri_loc_dic.keys()]
 one_hot_rig_id_df = pd.DataFrame(one_hot_attri,columns=columns)
 return one_hot_rig_id_df

对属性二值化可以采用：

## 对属性进行二值化
def binary_apply(key, attri, dataframe):
 key_modify = 'is_' + ''.join(lazy_pinyin(key)) + '_' + attri
 print(key_modify)
 dataframe[key_modify] = dataframe.apply(lambda x:1 if x[attri]== key else 0, axis=1)
 return dataframe

对字符进行编码,将字符转换为0，1，2…：

## 对字符进行编码
# columns = ['job', 'marital', 'education','default','housing' ,'loan','contact', 'poutcome']
def encode_info(dataframe, columns):
 for col in columns:
  print(col)
  dataframe[col] = pd.factorize(dataframe[col])[0]
 return dataframe

Tensorflow实现将标签变为one-hot形式

以上这篇Tensorflow实现将标签变为one-hot形式就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持三水点靠木。

Tensorflow实现将标签变为one-hot形式

- Author -

星夜孤帆

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python enumerate遍历数组示例应用

Sep 06 Python

Python Tkinter简单布局实例教程

Sep 03 Python

Python中用psycopg2模块操作PostgreSQL方法

Nov 28 Python

python版微信跳一跳游戏辅助

Jan 11 Python

浅谈Python实现2种文件复制的方法

Jan 19 Python

通过python将大量文件按修改时间分类的方法

Oct 17 Python

django解决跨域请求的问题

Nov 11 Python

Python 元组操作总结

Sep 18 Python

python 多维高斯分布数据生成方式

Dec 09 Python

Python实现一个论文下载器的过程

Jan 18 Python

opencv实现图像几何变换

Mar 24 Python

python执行js代码的方法

May 13 Python

Python selenium爬取微博数据代码实例

May 22 #Python

python实现文法左递归的消除方法

May 22 #Python

使用Django搭建网站实现商品分页功能

May 22 #Python

Tensorflow卷积实现原理+手写python代码实现卷积教程

May 22 #Python

Python实现发票自动校核微信机器人的方法

May 22 #Python

基于django micro搭建网站实现加水印功能

May 22 #Python

基于Tensorflow一维卷积用法详解

May 22 #Python

You might like

2021年最新CPU天梯图

2021/03/04 数码科技

php 随机排序广告的实现代码

2011/05/09 PHP

php中获取指定IP的物理地址的代码(正则表达式)

2011/06/23 PHP

页面乱码问题的根源及其分析

2013/08/09 PHP

Zend Framework教程之分发器Zend_Controller_Dispatcher用法详解

2016/03/07 PHP

兼容多浏览器的字幕特效Marquee的通用js类

2008/07/20 Javascript

form表单中去掉默认的enter键提交并绑定js方法实现代码

2013/04/01 Javascript

jQuery实现时尚漂亮的弹出式对话框实例

2015/08/07 Javascript

jquery小火箭返回顶部代码分享

2015/08/19 Javascript

微信小程序 122100版本更新问题解决方案

2016/12/22 Javascript

js轮播图透明度切换（带上下页和底部圆点切换）

2017/04/27 Javascript

详细介绍RxJS在Angular中的应用

2017/09/23 Javascript

jQuery选择器之子元素过滤选择器

2017/09/28 jQuery

Vue递归组件+Vuex开发树形组件Tree--递归组件的简单实现

2019/04/01 Javascript

微信小程序云开发之使用云存储

2019/05/17 Javascript

laravel-admin 与 vue 结合使用实例代码详解

2019/06/04 Javascript

微信小程序中为什么使用var that=this

2019/08/27 Javascript

js动态生成表格(节点操作)

2021/01/12 Javascript

使用Python脚本来获取Cisco设备信息的示例

2015/05/04 Python

python获取当前时间对应unix时间戳的方法

2015/05/15 Python

Python算法应用实战之队列详解

2017/02/04 Python

Python unittest单元测试框架的使用

2018/09/08 Python

python GUI库图形界面开发之PyQt5图片显示控件QPixmap详细使用方法与实例

2020/02/27 Python

python读取hdfs并返回dataframe教程

2020/06/05 Python

keras打印loss对权重的导数方式

2020/06/10 Python

python基于pygame实现飞机大作战小游戏

2020/11/19 Python

python 基于opencv 绘制图像轮廓

2020/12/11 Python

canvas与html5实现视频截图功能示例

2016/12/15 HTML / CSS

中国电视购物：快乐购

2017/02/04 全球购物

大学本科毕业生求职信范文

2013/12/18 职场文书

教师个人自我鉴定

2014/02/08 职场文书

企业承诺书怎么写

2014/05/24 职场文书

企业读书活动总结

2014/06/30 职场文书

读《工匠精神》有感：热爱工作，精益求精

2019/12/28 职场文书

python opencv通过4坐标剪裁图片

2021/06/05 Python

Pillow图像处理库安装及使用

2022/04/12 Python