Tensorflow实现将标签变为one-hot形式


Posted in Python onMay 22, 2020

将数据标签变为类似MNIST的one-hot编码形式

def one_hot(indices, 
 depth, 
 on_value=None, 
 off_value=None, 
 axis=None, 
 dtype=None, 
 name=None):
 """Returns a one-hot tensor.
 
 The locations represented by indices in `indices` take value 
 `on_value`,
 while all other locations take value `off_value`.
 
 `on_value` and `off_value` must have matching data types. If 
 `dtype` is also
 provided, they must be the same data type as specified by 
 `dtype`.
 
 If `on_value` is not provided, it will default to the value `1` with 
 type
 `dtype`
 
 If `off_value` is not provided, it will default to the value `0` with 
 type
 `dtype`
 
 If the input `indices` is rank `N`, the output will have rank 
 `N+1`. The
 new axis is created at dimension `axis` (default: the new axis is 
 appended
 at the end).
 
 If `indices` is a scalar the output shape will be a vector of 
 length `depth`
 
 If `indices` is a vector of length `features`, the output shape will 
 be:
 
 ```
 features x depth if axis == -1
 depth x features if axis == 0
 ```
 
 If `indices` is a matrix (batch) with shape `[batch, features]`, the 
 output
 shape will be:
 
 ```
 batch x features x depth if axis == -1
 batch x depth x features if axis == 1
 depth x batch x features if axis == 0
 ```
 
 If `dtype` is not provided, it will attempt to assume the data 
 type of
 `on_value` or `off_value`, if one or both are passed in. If none 
 of
 `on_value`, `off_value`, or `dtype` are provided, `dtype` will 
 default to the
 value `tf.float32`.
 
 Note: If a non-numeric data type output is desired (`tf.string`, 
 `tf.bool`,
 etc.), both `on_value` and `off_value` _must_ be provided to 
 `one_hot`.
 
 For example:
 
 ```python
 indices = [0, 1, 2]
 depth = 3
 tf.one_hot(indices, depth) # output: [3 x 3]
 # [[1., 0., 0.],
 # [0., 1., 0.],
 # [0., 0., 1.]]
 
 indices = [0, 2, -1, 1]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=5.0, off_value=0.0,
 axis=-1) # output: [4 x 3]
 # [[5.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 5.0], # one_hot(2)
 # [0.0, 0.0, 0.0], # one_hot(-1)
 # [0.0, 5.0, 0.0]] # one_hot(1)
 
 indices = [[0, 2], [1, -1]]
 depth = 3
 tf.one_hot(indices, depth,
 on_value=1.0, off_value=0.0,
 axis=-1) # output: [2 x 2 x 3]
 # [[[1.0, 0.0, 0.0], # one_hot(0)
 # [0.0, 0.0, 1.0]], # one_hot(2)
 # [[0.0, 1.0, 0.0], # one_hot(1)
 # [0.0, 0.0, 0.0]]] # one_hot(-1)
 ```
 
 Args:
 indices: A `Tensor` of indices.
 depth: A scalar defining the depth of the one hot dimension.
 on_value: A scalar defining the value to fill in output when 
 `indices[j]
 = i`. (default: 1)
 off_value: A scalar defining the value to fill in output when 
 `indices[j]
 != i`. (default: 0)
 axis: The axis to fill (default: -1, a new inner-most axis).
 dtype: The data type of the output tensor.
 
 Returns:
 output: The one-hot tensor.
 
 Raises:
 TypeError: If dtype of either `on_value` or `off_value` don't 
 match `dtype`
 TypeError: If dtype of `on_value` and `off_value` don't match 
 one another
 """
 with ops.name_scope(name, "one_hot", 
 [indices, depth, on_value, off_value, axis, 
  dtype]) as name:
 on_exists = on_value is not None
 off_exists = off_value is not None
 on_dtype = ops.convert_to_tensor(on_value).dtype.base_dtype 
  if on_exists else None
 off_dtype = ops.convert_to_tensor(off_value).dtype.
  base_dtype if off_exists else None
 if on_exists or off_exists:
  if dtype is not None:
  # Ensure provided on_value and/or off_value match dtype
  if (on_exists and on_dtype != dtype):
   raise TypeError("dtype {0} of on_value does not match "
   "dtype parameter {1}".format(on_dtype, dtype))
  if (off_exists and off_dtype != dtype):
   raise TypeError("dtype {0} of off_value does not match "
   "dtype parameter {1}".format(off_dtype, dtype))
  else:
  # dtype not provided: automatically assign it
  dtype = on_dtype if on_exists else off_dtype
 elif dtype is None:
  # None of on_value, off_value, or dtype provided. Default 
  dtype to float32
  dtype = dtypes.float32
 if not on_exists:
  # on_value not provided: assign to value 1 of type dtype
  on_value = ops.convert_to_tensor(1, dtype, name="
  on_value")
  on_dtype = dtype
 if not off_exists:
  # off_value not provided: assign to value 0 of type dtype
  off_value = ops.convert_to_tensor(0, dtype, name="
  off_value")
  off_dtype = dtype
 if on_dtype != off_dtype:
  raise TypeError("dtype {0} of on_value does not match "
  "dtype {1} of off_value".format(on_dtype, off_dtype))
 return gen_array_ops._one_hot(indices, depth, on_value, 
  off_value, axis, 
  name)
 
 
Enter: apply completion.
 + Ctrl: remove arguments and replace current word (no Pop-
 up focus).
 + Shift: remove arguments (requires Pop-up focus).
import tensorflow as tf
import numpy as np
data = np.linspace(0,9,10)
label = tf.one_hot(data,10)
with tf.Session() as sess:
 print(data)
 print(sess.run(label))

Tensorflow实现将标签变为one-hot形式

补充知识:数据清洗—制作one-hot

使用pandas进行one-hot编码

pandas.get_dummies(data, prefix=None, prefix_sep='_', dummy_na=False, columns=None, sparse=False, drop_first=False, dtype=None)

pandas中get_dummies()函数可以将字段进行编码,转换为01形式,其中prefix可以为每个新展开的列名添加前缀。

但是,笔者发现它较易使用在数据为每一列为单独的字符:

Tensorflow实现将标签变为one-hot形式

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'], 'C': [1, 2, 3]})

## one-hot
df_dumm = pd.get_dummies(df)

Tensorflow实现将标签变为one-hot形式

my_one_hot

但是对于数据为下面形式的可就不能直接转换了,需要先预处理一下,之后转换为one-hot形式:

Tensorflow实现将标签变为one-hot形式

我的做法是:

## tqdm_notebook可以导入tqdm包来使用
def one_hot_my(dataframe, attri):
 sample_attri_list = []
 sample_attri_loc_dic = {}
 loc = 0
 dataframe[attri] = dataframe[attri].astype(str)
 for attri_id in tqdm_notebook(dataframe[attri]):
  attri_id_pro = attri_id.strip().split(',')
  for key in attri_id_pro:
   if key not in sample_attri_loc_dic.keys():
    sample_attri_loc_dic[key] = loc
    loc+=1
  sample_attri_list.append(attri_id_pro)
 print("开始完成one-hot.......")  
 one_hot_attri = []
 for attri_id in tqdm_notebook(sample_attri_list):
  array = [0 for _ in range(len(sample_attri_loc_dic.keys()))]
  for key in attri_id:
   array[sample_attri_loc_dic[key]] = 1
  one_hot_attri.append(array)
 print("封装成dataframe.......") 
 ## 封装成dataframe
 columns = [attri+x for x in sample_attri_loc_dic.keys()]
 one_hot_rig_id_df = pd.DataFrame(one_hot_attri,columns=columns)
 return one_hot_rig_id_df

对属性二值化可以采用:

## 对属性进行二值化
def binary_apply(key, attri, dataframe):
 key_modify = 'is_' + ''.join(lazy_pinyin(key)) + '_' + attri
 print(key_modify)
 dataframe[key_modify] = dataframe.apply(lambda x:1 if x[attri]== key else 0, axis=1)
 return dataframe

对字符进行编码,将字符转换为0,1,2…:

## 对字符进行编码
# columns = ['job', 'marital', 'education','default','housing' ,'loan','contact', 'poutcome']
def encode_info(dataframe, columns):
 for col in columns:
  print(col)
  dataframe[col] = pd.factorize(dataframe[col])[0]
 return dataframe

Tensorflow实现将标签变为one-hot形式

以上这篇Tensorflow实现将标签变为one-hot形式就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
详解Python的Django框架中的模版相关知识
Jul 15 Python
python递归查询菜单并转换成json实例
Mar 27 Python
Python与R语言的简要对比
Nov 14 Python
Python进阶学习之特殊方法实例详析
Dec 01 Python
python实现识别手写数字 python图像识别算法
Mar 23 Python
Python实现的求解最大公约数算法示例
May 03 Python
pycharm重置设置,恢复默认设置的方法
Oct 22 Python
python numpy之np.random的随机数函数使用介绍
Oct 06 Python
python中的split()函数和os.path.split()函数使用详解
Dec 21 Python
在pytorch 中计算精度、回归率、F1 score等指标的实例
Jan 18 Python
Python实现密钥密码(加解密)实例详解
Apr 26 Python
Python OpenCV实现图形检测示例详解
Apr 08 Python
Python selenium爬取微博数据代码实例
May 22 #Python
python实现文法左递归的消除方法
May 22 #Python
使用Django搭建网站实现商品分页功能
May 22 #Python
Tensorflow卷积实现原理+手写python代码实现卷积教程
May 22 #Python
Python实现发票自动校核微信机器人的方法
May 22 #Python
基于django micro搭建网站实现加水印功能
May 22 #Python
基于Tensorflow一维卷积用法详解
May 22 #Python
You might like
很好用的PHP数据库类
2009/05/27 PHP
浅谈PHP 闭包特性在实际应用中的问题
2009/10/30 PHP
PHP url 加密解密函数代码
2011/08/26 PHP
php+curl 发送图片处理代码分享
2015/07/09 PHP
ThinkPHP框架分布式数据库连接方法详解
2017/03/14 PHP
浅谈Laravel中的一个后期静态绑定
2017/08/11 PHP
jQuery下扩展插件和拓展函数的写法(匿名函数使用的典型例子)
2010/10/20 Javascript
javascript 学习笔记(六)浏览器类型及版本信息检测代码
2011/04/08 Javascript
模拟用户点击弹出新页面不会被浏览器拦截
2014/04/08 Javascript
2014最热门的JavaScript代码高亮插件推荐
2014/11/25 Javascript
EasyUI中combobox默认值注意事项
2015/03/01 Javascript
javascript中replace( )方法的使用
2015/04/24 Javascript
基于JavaScript实现根据手机定位获取当前具体位置(X省X市X县X街道X号)
2015/12/29 Javascript
JavaScript实现瀑布流以及加载效果
2017/02/11 Javascript
详解基于Node.js的微信JS-SDK后端接口实现代码
2017/07/15 Javascript
中高级前端必须了解的JS中的内存管理(推荐)
2019/07/04 Javascript
Python下的Mysql模块MySQLdb安装详解
2014/04/09 Python
利用python微信库itchat实现微信自动回复功能
2017/05/18 Python
Python3学习笔记之列表方法示例详解
2017/10/06 Python
安装python3的时候就是输入python3死活没有反应的解决方法
2018/01/24 Python
python爬虫中get和post方法介绍以及cookie作用
2018/02/08 Python
用Python3创建httpServer的简单方法
2018/06/04 Python
vue.js实现输入框输入值内容实时响应变化示例
2018/07/07 Python
网易2016研发工程师编程题 奖学金(python)
2019/06/19 Python
vscode写python时的代码错误提醒和自动格式化的方法
2020/05/07 Python
CSS3盒子模型详解
2013/04/24 HTML / CSS
Homestay中文官网:全球寄宿家庭
2018/10/18 全球购物
申报职称专业技术个人的自我评价
2013/12/12 职场文书
应届毕业生如何写求职信
2014/02/16 职场文书
荷叶母亲教学反思
2014/04/30 职场文书
建筑学专业自荐书
2014/07/09 职场文书
婚庆公司计划书
2014/09/15 职场文书
2015年档案室工作总结
2015/05/23 职场文书
2015年安全生产管理工作总结
2015/05/25 职场文书
2019年自助餐厅创业计划书模板
2019/08/22 职场文书
Mysql数据库命令大全
2021/05/26 MySQL