pandas read_excel()和to_excel()函数解析


Posted in Python onSeptember 19, 2019

前言

数据分析时候,需要将数据进行加载和存储,本文主要介绍和excel的交互。

read_excel()

加载函数为read_excel(),其具体参数如下。

read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None,names=None, parse_cols=None, parse_dates=False,date_parser=None,na_values=None,thousands=None, convert_float=True, has_index_names=None, converters=None,dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)

常用参数解析:

  • io : string, path object ; excel 路径。
  • sheetname : string, int, mixed list of strings/ints, or None, default 0 返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe
  • header : int, list of ints, default 0 指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None
  • skiprows : list-like,Rows to skip at the beginning,省略指定行数的数据
  • skip_footer : int,default 0, 省略从尾部数的int行数据
  • index_col : int, list of ints, default None指定列为索引列,也可以使用u”strings”
  • names : array-like, default None, 指定列的名字。

数据源:

sheet1:
ID NUM-1  NUM-2  NUM-3
36901  142 168 661
36902  78 521 602
36903  144 600 521
36904  95 457 468
36905  69 596 695

sheet2:
ID NUM-1  NUM-2  NUM-3
36906  190 527 691
36907  101 403 470

(1)函数原型

basestation ="F://pythonBook_PyPDAM/data/test.xls"
data = pd.read_excel(basestation)
print data

输出:是一个dataframe

ID NUM-1 NUM-2 NUM-3
0 36901  142  168  661
1 36902   78  521  602
2 36903  144  600  521
3 36904   95  457  468
4 36905   69  596  695

(2) sheetname参数:返回多表使用sheetname=[0,1],若sheetname=None是返回全表 注意:int/string 返回的是dataframe,而none和list返回的是dict of dataframe

data_1 = pd.read_excel(basestation,sheetname=[0,1])
print data_1
print type(data_1)

输出:dict of dataframe

OrderedDict([(0,    ID NUM-1 NUM-2 NUM-3
0 36901  142  168  661
1 36902   78  521  602
2 36903  144  600  521
3 36904   95  457  468
4 36905   69  596  695), 
(1,    ID NUM-1 NUM-2 NUM-3
0 36906  190  527  691
1 36907  101  403  470)])

(3)header参数:指定列名行,默认0,即取第一行,数据为列名行以下的数据 若数据不含列名,则设定 header = None ,注意这里还有列名的一行。

data = pd.read_excel(basestation,header=None)
print data
输出:
    0   1   2   3
0   ID NUM-1 NUM-2 NUM-3
1 36901  142  168  661
2 36902   78  521  602
3 36903  144  600  521
4 36904   95  457  468
5 36905   69  596  695

data = pd.read_excel(basestation,header=[3])
print data
输出:
  36903 144  600  521 
0 36904   95  457  468
1 36905   69  596  695

(4) skiprows 参数:省略指定行数的数据

data = pd.read_excel(basestation,skiprows = [1])
print data
输出:
   ID NUM-1 NUM-2 NUM-3
0 36902   78  521  602
1 36903  144  600  521
2 36904   95  457  468
3 36905   69  596  695

(5)skip_footer参数:省略从尾部数的int行的数据

data = pd.read_excel(basestation, skip_footer=3)
print data
输出:
   ID NUM-1 NUM-2 NUM-3
0 36901  142  168  661
1 36902   78  521  602

(6)index_col参数:指定列为索引列,也可以使用u”strings”

data = pd.read_excel(basestation, index_col="NUM-3")
print data
输出:
     ID NUM-1 NUM-2
NUM-3           
661  36901  142  168
602  36902   78  521
521  36903  144  600
468  36904   95  457
695  36905   69  596

(7)names参数: 指定列的名字。

data = pd.read_excel(basestation,names=["a","b","c","e"])
print data
    a  b  c  e
0 36901 142 168 661
1 36902  78 521 602
2 36903 144 600 521
3 36904  95 457 468
4 36905  69 596 695

具体参数如下:

>>> print help(pandas.read_excel)
Help on function read_excel in module pandas.io.excel:

read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0, index_col=None, names=None, parse_cols=None, parse_dates=False, date_parser=None, na_values=None, thousands=None, convert_float=True, has_index_names=None, converters=None, dtype=None, true_values=None, false_values=None, engine=None, squeeze=False, **kwds)
  Read an Excel table into a pandas DataFrame

  Parameters
  ----------
  io : string, path object (pathlib.Path or py._path.local.LocalPath),
    file-like object, pandas ExcelFile, or xlrd workbook.
    The string could be a URL. Valid URL schemes include http, ftp, s3,
    and file. For file URLs, a host is expected. For instance, a local
    file could be file://localhost/path/to/workbook.xlsx
  sheetname : string, int, mixed list of strings/ints, or None, default 0

    Strings are used for sheet names, Integers are used in zero-indexed
    sheet positions.

    Lists of strings/integers are used to request multiple sheets.

    Specify None to get all sheets.

    str|int -> DataFrame is returned.
    list|None -> Dict of DataFrames is returned, with keys representing
    sheets.

    Available Cases

    * Defaults to 0 -> 1st sheet as a DataFrame
    * 1 -> 2nd sheet as a DataFrame
    * "Sheet1" -> 1st sheet as a DataFrame
    * [0,1,"Sheet5"] -> 1st, 2nd & 5th sheet as a dictionary of DataFrames
    * None -> All sheets as a dictionary of DataFrames

  header : int, list of ints, default 0
    Row (0-indexed) to use for the column labels of the parsed
    DataFrame. If a list of integers is passed those row positions will
    be combined into a ``MultiIndex``
  skiprows : list-like
    Rows to skip at the beginning (0-indexed)
  skip_footer : int, default 0
    Rows at the end to skip (0-indexed)
  index_col : int, list of ints, default None
    Column (0-indexed) to use as the row labels of the DataFrame.
    Pass None if there is no such column. If a list is passed,
    those columns will be combined into a ``MultiIndex``. If a
    subset of data is selected with ``parse_cols``, index_col
    is based on the subset.
  names : array-like, default None
    List of column names to use. If file contains no header row,
    then you should explicitly pass header=None
  converters : dict, default None
    Dict of functions for converting values in certain columns. Keys can
    either be integers or column labels, values are functions that take one
    input argument, the Excel cell content, and return the transformed
    content.
  dtype : Type name or dict of column -> type, default None
    Data type for data or columns. E.g. {'a': np.float64, 'b': np.int32}
    Use `object` to preserve data as stored in Excel and not interpret dtype.
    If converters are specified, they will be applied INSTEAD
    of dtype conversion.

    .. versionadded:: 0.20.0

  true_values : list, default None
    Values to consider as True

    .. versionadded:: 0.19.0

  false_values : list, default None
    Values to consider as False

    .. versionadded:: 0.19.0

  parse_cols : int or list, default None
    * If None then parse all columns,
    * If int then indicates last column to be parsed
    * If list of ints then indicates list of column numbers to be parsed
    * If string then indicates comma separated list of Excel column letters and
     column ranges (e.g. "A:E" or "A,C,E:F"). Ranges are inclusive of
     both sides.
  squeeze : boolean, default False
    If the parsed data only contains one column then return a Series
  na_values : scalar, str, list-like, or dict, default None
    Additional strings to recognize as NA/NaN. If dict passed, specific
    per-column NA values. By default the following values are interpreted
    as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan',
  '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.
  thousands : str, default None
    Thousands separator for parsing string columns to numeric. Note that
    this parameter is only necessary for columns stored as TEXT in Excel,
    any numeric columns will automatically be parsed, regardless of display
    format.
  keep_default_na : bool, default True
    If na_values are specified and keep_default_na is False the default NaN
    values are overridden, otherwise they're appended to.
  verbose : boolean, default False
    Indicate number of NA values placed in non-numeric columns
  engine: string, default None
    If io is not a buffer or path, this must be set to identify io.
    Acceptable values are None or xlrd
  convert_float : boolean, default True
    convert integral floats to int (i.e., 1.0 --> 1). If False, all numeric
    data will be read in as floats: Excel stores all numbers as floats
    internally
  has_index_names : boolean, default None
    DEPRECATED: for version 0.17+ index names will be automatically
    inferred based on index_col. To read Excel output from 0.16.2 and
    prior that had saved index names, use True.

  Returns

to_excel()

存储函数为pd.DataFrame.to_excel(),注意,必须是DataFrame写入excel, 即Write DataFrame to an excel sheet。其具体参数如下:

to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='', float_format=None,columns=None, header=True, index=True, index_label=None,startrow=0, startcol=0, engine=None, merge_cells=True, encoding=None,
inf_rep='inf', verbose=True, freeze_panes=None)

常用参数解析

  • - excel_writer : string or ExcelWriter object File path or existing ExcelWriter目标路径
  • - sheet_name : string, default ‘Sheet1' Name of sheet which will contain DataFrame,填充excel的第几页
  • - na_rep : string, default ”,Missing data representation 缺失值填充
  • - float_format : string, default None Format string for floating point numbers
  • - columns : sequence, optional,Columns to write 选择输出的的列。
  • - header : boolean or list of string, default True Write out column names. If a list of string is given it is assumed to be aliases for the column names
  • - index : boolean, default True,Write row names (index)
  • - index_label : string or sequence, default None, Column label for index column(s) if desired. If None is given, andheader and index are True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.
  • - startrow :upper left cell row to dump data frame
  • - startcol :upper left cell column to dump data frame
  • - engine : string, default None ,write engine to use - you can also set this via the options,io.excel.xlsx.writer, io.excel.xls.writer, andio.excel.xlsm.writer.
  • - merge_cells : boolean, default True Write MultiIndex and Hierarchical Rows as merged cells.
  • - encoding: string, default None encoding of the resulting excel file. Only necessary for xlwt,other writers support unicode natively.
  • - inf_rep : string, default ‘inf' Representation for infinity (there is no native representation for infinity in Excel)
  • - freeze_panes : tuple of integer (length 2), default None Specifies the one-based bottommost row and rightmost column that is to be frozen

数据源:

ID NUM-1  NUM-2  NUM-3
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453 

加载数据:
basestation ="F://python/data/test.xls"
basestation_end ="F://python/data/test_end.xls"
data = pd.read_excel(basestation)

(1)参数excel_writer,输出路径。

data.to_excel(basestation_end)
输出:
  ID NUM-1  NUM-2  NUM-3
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453

(2)sheet_name,将数据存储在excel的那个sheet页面。

data.to_excel(basestation_end,sheet_name="sheet2")

(3)na_rep,缺失值填充

data.to_excel(basestation_end,na_rep="NULL")
输出:
  ID NUM-1  NUM-2  NUM-3
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453 NULL

(4) colums参数: sequence, optional,Columns to write 选择输出的的列。

data.to_excel(basestation_end,columns=["ID"])
输出
  ID
0  36901
1  36902
2  36903
3  36904
4  36905
5  36906

(5)header 参数: boolean or list of string,默认为True,可以用list命名列的名字。header = False 则不输出题头。

data.to_excel(basestation_end,header=["a","b","c","d"])
输出:
  a  b  c  d
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453 


data.to_excel(basestation_end,header=False,columns=["ID"])
header = False 则不输出题头
输出:
0  36901
1  36902
2  36903
3  36904
4  36905
5  36906

(6)index : boolean, default True Write row names (index)

默认为True,显示index,当index=False 则不显示行索引(名字)。

index_label : string or sequence, default None

设置索引列的列名。

data.to_excel(basestation_end,index=False)
输出:
ID NUM-1  NUM-2  NUM-3
36901  142 168 661
36902  78 521 602
36903  144 600 521
36904  95 457 468
36905  69 596 695
36906  165 453 

data.to_excel(basestation_end,index_label=["f"])
输出:
f  ID NUM-1  NUM-2  NUM-3
0  36901  142 168 661
1  36902  78 521 602
2  36903  144 600 521
3  36904  95 457 468
4  36905  69 596 695
5  36906  165 453

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持三水点靠木。

Python 相关文章推荐
python 内置函数filter
Jun 01 Python
Django中STATIC_ROOT和STATIC_URL及STATICFILES_DIRS浅析
May 08 Python
python调用摄像头显示图像的实例
Aug 03 Python
解决python3 pika之连接断开的问题
Dec 18 Python
python实现石头剪刀布小游戏
Jan 20 Python
Python的UTC时间转换讲解
Feb 26 Python
Python占用的内存优化教程
Jul 28 Python
解决Django中调用keras的模型出现的问题
Aug 07 Python
Python在OpenCV里实现极坐标变换功能
Sep 02 Python
Matplotlib使用Cursor实现UI定位的示例代码
Mar 12 Python
Python发送邮件实现基础解析
Aug 14 Python
Python中递归以及递归遍历目录详解
Oct 24 Python
python openvc 裁剪、剪切图片 提取图片的行和列
Sep 19 #Python
vscode 配置 python3开发环境的方法
Sep 19 #Python
python实现简易学生信息管理系统
Apr 05 #Python
Python字符串大小写转换拼接删除空白
Sep 19 #Python
python BlockingScheduler定时任务及其他方式的实现
Sep 19 #Python
python实现简单成绩录入系统
Sep 19 #Python
淘宝秒杀python脚本 扫码登录版
Sep 19 #Python
You might like
PHP的单引号和双引号 字符串效率
2009/05/27 PHP
用js进行url编码后用php反解以及用php实现js的escape功能函数总结
2010/02/08 PHP
smarty模板判断数组为空的方法
2015/06/10 PHP
PHP二维数组矩形转置实例
2016/07/20 PHP
php获取今日开始时间和结束时间的方法
2017/02/27 PHP
谈谈从phpinfo中能获取哪些值得注意的信息
2017/03/28 PHP
Yii框架创建cronjob定时任务的方法分析
2017/05/23 PHP
自动检查并替换文本框内的字符
2006/06/30 Javascript
JQuery 图片的展开和伸缩实例讲解
2013/04/18 Javascript
JS实现遮罩层效果的简单实例
2013/11/12 Javascript
如何使用PHP+jQuery+MySQL实现异步加载ECharts地图数据(附源码下载)
2016/02/23 Javascript
JS实现table表格数据排序功能(可支持动态数据+分页效果)
2016/05/26 Javascript
javascript获取指定区间范围随机数的方法
2017/09/08 Javascript
微信小程序之蓝牙的链接
2017/09/26 Javascript
vue2.0+vue-dplayer实现hls播放的示例
2018/03/02 Javascript
Hexo已经看腻了,来手把手教你使用VuePress搭建个人博客
2018/04/26 Javascript
微信小程序使用template标签实现五星评分功能
2018/11/03 Javascript
详解vue 在移动端体验上的优化解决方案
2019/05/20 Javascript
js中的this的指向问题详解
2019/08/29 Javascript
Vue的状态管理vuex使用方法详解
2020/02/05 Javascript
手动实现vue2.0的双向数据绑定原理详解
2021/02/06 Vue.js
[37:35]DOTA2上海特级锦标赛A组资格赛#1 Secret VS MVP.Phx第二局
2016/02/25 DOTA
Python列表list数组array用法实例解析
2014/10/28 Python
python实现将html表格转换成CSV文件的方法
2015/06/28 Python
python scatter散点图用循环分类法加图例
2019/03/19 Python
对python3 sort sorted 函数的应用详解
2019/06/27 Python
由面试题加深对Django的认识理解
2019/07/19 Python
使用Python开发个京东上抢口罩的小实例(仅作技术研究学习使用)
2020/03/10 Python
美国面料纺织品商城:Fabric.com
2017/06/28 全球购物
思想汇报范文
2013/11/04 职场文书
小学生清明节演讲稿
2014/09/05 职场文书
拾金不昧感谢信范文
2015/01/21 职场文书
教师求职自荐信范文
2015/03/04 职场文书
优秀教师工作总结2015
2015/07/22 职场文书
积极心理学课程心得体会
2016/01/22 职场文书
大学生暑期实践报告之企业经营管理
2019/08/08 职场文书