基于h5py的使用及数据封装代码


Posted in Python onDecember 26, 2019

1. h5py简单介绍

h5py文件是存放两类对象的容器,数据集(dataset)和组(group),dataset类似数组类的数据集合,和numpy的数组差不多。group是像文件夹一样的容器,它好比python中的字典,有键(key)和值(value)。group中可以存放dataset或者其他的group。”键”就是组成员的名称,”值”就是组成员对象本身(组或者数据集),下面来看下如何创建组和数据集。

1.1 创建一个h5py文件

import h5py
#要是读取文件的话,就把w换成r
f=h5py.File("myh5py.hdf5","w")

在当前目录下会生成一个myh5py.hdf5文件。

2. 创建dataset数据集

import h5py
f=h5py.File("myh5py.hdf5","w")
#deset1是数据集的name,(20,)代表数据集的shape,i代表的是数据集的元素类型
d1=f.create_dataset("dset1", (20,), 'i')
for key in f.keys():
 print(key)
 print(f[key].name)
 print(f[key].shape)
 print(f[key].value)

输出:

dset1
/dset1
(20,)
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
import h5py
import numpy as np
f=h5py.File("myh5py.hdf5","w")
a=np.arange(20)
d1=f.create_dataset("dset1",data=a)
for key in f.keys():
 print(f[key].name)
 print(f[key].value)

输出:

/dset1
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
2. hpf5用于封装训练集和测试集
#============================================================
# This prepare the hdf5 datasets of the DRIVE database
#============================================================
 
import os
import h5py
import numpy as np
from PIL import Image
 
def write_hdf5(arr,outfile):
 with h5py.File(outfile,"w") as f:
 f.create_dataset("image", data=arr, dtype=arr.dtype)
 
#------------Path of the images --------------------------------------------------------------
#train
original_imgs_train = "./DRIVE/training/images/"
groundTruth_imgs_train = "./DRIVE/training/1st_manual/"
borderMasks_imgs_train = "./DRIVE/training/mask/"
#test
original_imgs_test = "./DRIVE/test/images/"
groundTruth_imgs_test = "./DRIVE/test/1st_manual/"
borderMasks_imgs_test = "./DRIVE/test/mask/"
#---------------------------------------------------------------------------------------------
 
Nimgs = 20
channels = 3
height = 584
width = 565
dataset_path = "./DRIVE_datasets_training_testing/"
 
def get_datasets(imgs_dir,groundTruth_dir,borderMasks_dir,train_test="null"):
 imgs = np.empty((Nimgs,height,width,channels))
 groundTruth = np.empty((Nimgs,height,width))
 border_masks = np.empty((Nimgs,height,width))
 for path, subdirs, files in os.walk(imgs_dir): #list all files, directories in the path
  for i in range(len(files)):
   #original
   print "original image: " +files[i]
   img = Image.open(imgs_dir+files[i])
   imgs[i] = np.asarray(img)
   #corresponding ground truth
   groundTruth_name = files[i][0:2] + "_manual1.gif"
   print "ground truth name: " + groundTruth_name
   g_truth = Image.open(groundTruth_dir + groundTruth_name)
   groundTruth[i] = np.asarray(g_truth)
   #corresponding border masks
   border_masks_name = ""
   if train_test=="train":
    border_masks_name = files[i][0:2] + "_training_mask.gif"
   elif train_test=="test":
    border_masks_name = files[i][0:2] + "_test_mask.gif"
   else:
    print "specify if train or test!!"
    exit()
   print "border masks name: " + border_masks_name
   b_mask = Image.open(borderMasks_dir + border_masks_name)
   border_masks[i] = np.asarray(b_mask)
 
 print "imgs max: " +str(np.max(imgs))
 print "imgs min: " +str(np.min(imgs))
 assert(np.max(groundTruth)==255 and np.max(border_masks)==255)
 assert(np.min(groundTruth)==0 and np.min(border_masks)==0)
 print "ground truth and border masks are correctly withih pixel value range 0-255 (black-white)"
 #reshaping for my standard tensors
 imgs = np.transpose(imgs,(0,3,1,2))
 assert(imgs.shape == (Nimgs,channels,height,width))
 groundTruth = np.reshape(groundTruth,(Nimgs,1,height,width))
 border_masks = np.reshape(border_masks,(Nimgs,1,height,width))
 assert(groundTruth.shape == (Nimgs,1,height,width))
 assert(border_masks.shape == (Nimgs,1,height,width))
 return imgs, groundTruth, border_masks
 
if not os.path.exists(dataset_path):
 os.makedirs(dataset_path)
#getting the training datasets
imgs_train, groundTruth_train, border_masks_train = get_datasets(original_imgs_train,groundTruth_imgs_train,borderMasks_imgs_train,"train")
print "saving train datasets"
write_hdf5(imgs_train, dataset_path + "DRIVE_dataset_imgs_train.hdf5")
write_hdf5(groundTruth_train, dataset_path + "DRIVE_dataset_groundTruth_train.hdf5")
write_hdf5(border_masks_train,dataset_path + "DRIVE_dataset_borderMasks_train.hdf5")
 
#getting the testing datasets
imgs_test, groundTruth_test, border_masks_test = get_datasets(original_imgs_test,groundTruth_imgs_test,borderMasks_imgs_test,"test")
print "saving test datasets"
write_hdf5(imgs_test,dataset_path + "DRIVE_dataset_imgs_test.hdf5")
write_hdf5(groundTruth_test, dataset_path + "DRIVE_dataset_groundTruth_test.hdf5")
write_hdf5(border_masks_test,dataset_path + "DRIVE_dataset_borderMasks_test.hdf5")

遍历文件夹下的所有文件 os.walk( dir )

for parent, dir_names, file_names in os.walk(parent_dir): 
 for i in file_names: 
  print file_name

parent: 父路径

dir_names: 子文件夹

file_names: 文件名

以上这篇基于h5py的使用及数据封装代码就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python处理cookie详解
Feb 07 Python
简单介绍Python的轻便web框架Bottle
Apr 08 Python
浅谈python中的__init__、__new__和__call__方法
Jul 18 Python
Python中的pygal安装和绘制直方图代码分享
Dec 08 Python
Python cookbook(数据结构与算法)保存最后N个元素的方法
Feb 13 Python
python实现输入三角形边长自动作图求面积案例
Apr 12 Python
Python figure参数及subplot子图绘制代码
Apr 18 Python
基于python实现数组格式参数加密计算
Apr 21 Python
numpy的Fancy Indexing和array比较详解
Jun 11 Python
详解Python中第三方库Faker
Sep 25 Python
详解python的super()的作用和原理
Oct 29 Python
Python time库的时间时钟处理
May 02 Python
python深copy和浅copy区别对比解析
Dec 26 #Python
详解python opencv、scikit-image和PIL图像处理库比较
Dec 26 #Python
torch 中各种图像格式转换的实现方法
Dec 26 #Python
python两个_多个字典合并相加的实例代码
Dec 26 #Python
Python时间差中seconds和total_seconds的区别详解
Dec 26 #Python
python requests模拟登陆github的实现方法
Dec 26 #Python
python 实现按对象传值
Dec 26 #Python
You might like
php木马攻击防御之道
2008/03/24 PHP
php SQL防注入代码集合
2008/04/25 PHP
php实现信用卡校验位算法THE LUHN MOD-10示例
2014/05/07 PHP
在Windows XP下安装Apache+MySQL+PHP环境
2015/02/22 PHP
php有效防止图片盗用、盗链的两种方法
2016/11/01 PHP
php die()与exit()的区别实例详解
2016/12/03 PHP
$.ajax json数据传递方法
2008/11/19 Javascript
javascript 循环读取JSON数据的代码
2010/07/17 Javascript
Javascript的setTimeout()使用闭包特性时需要注意的问题
2014/09/23 Javascript
Javascript控制div属性动态变化实例分析
2015/10/08 Javascript
微信jssdk用法汇总
2016/07/16 Javascript
Javascript 创建类并动态添加属性及方法的简单实现
2016/10/20 Javascript
JavaScript登录验证码的实现
2016/10/27 Javascript
快速解决js中window.location.href不工作的问题
2016/11/02 Javascript
javascript编程开发中取色器及封装$函数用法示例
2017/08/09 Javascript
ActiveX控件的使用-js实现打印超市小票功能代码详解
2017/11/22 Javascript
Vue-cli3项目配置Vue.config.js实战记录
2018/07/29 Javascript
Angular2实现的秒表及改良版示例
2019/05/10 Javascript
layui实现根据table数据判断按钮显示情况的方法
2019/09/26 Javascript
[48:31]完美世界DOTA2联赛PWL S3 DLG vs Phoenix 第二场 12.17
2020/12/19 DOTA
Python中的装饰器用法详解
2015/01/14 Python
Python人脸识别初探
2017/12/21 Python
Pandas 数据框增、删、改、查、去重、抽样基本操作方法
2018/04/12 Python
通过python检测字符串的字母
2020/02/18 Python
Python通过文本和图片生成词云图
2020/05/21 Python
CSS3制作ajax loader icon实现思路及代码
2013/08/25 HTML / CSS
Canvas实现放大镜效果完整案例分析(附代码)
2020/11/26 HTML / CSS
斯洛伐克最大的婴儿食品和用品网上商店:Feedo.sk
2020/12/21 全球购物
求职简历自我评价范例
2014/03/12 职场文书
《三亚落日》教学反思
2014/04/26 职场文书
个人担保书范文
2014/05/20 职场文书
暑期培训班策划方案
2014/08/26 职场文书
土木工程专业本科生求职信
2014/10/01 职场文书
一般基层干部群众路线教育实践活动个人对照检查材料
2014/11/04 职场文书
2014年乡镇安全生产工作总结
2014/12/02 职场文书
2015年学校禁毒工作总结
2015/05/27 职场文书