编程 Python

浅谈pandas中Dataframe的查询方法([], loc, iloc, at, iat, ix)

Posted in Python onApril 10, 2018

pandas为我们提供了多种切片方法，而要是不太了解这些方法，就会经常容易混淆。下面举例对这些切片方法进行说明。

数据介绍

先随机生成一组数据：

In [5]: rnd_1 = [random.randrange(1,20) for x in xrange(1000)]
  ...: rnd_2 = [random.randrange(1,20) for x in xrange(1000)]
  ...: rnd_3 = [random.randrange(1,20) for x in xrange(1000)]
  ...: fecha = pd.date_range('2012-4-10', '2015-1-4')
  ...: 
  ...: data = pd.DataFrame({'fecha':fecha, 'rnd_1': rnd_1, 'rnd_2': rnd_2, 'rnd_3': rnd_3})
In [6]: data.describe()
Out[6]: 
       rnd_1    rnd_2    rnd_3
count 1000.000000 1000.000000 1000.000000
mean   9.946000   9.825000   9.894000
std    5.553911   5.559432   5.423484
min    1.000000   1.000000   1.000000
25%    5.000000   5.000000   5.000000
50%   10.000000  10.000000  10.000000
75%   15.000000  15.000000  14.000000
max   19.000000  19.000000  19.000000

[]切片方法

使用方括号能够对DataFrame进行切片，有点类似于python的列表切片。按照索引能够实现行选择或列选择或区块选择。

# 行选择
In [7]: data[1:5]
Out[7]: 
    fecha rnd_1 rnd_2 rnd_3
1 2012-04-11   1   16   3
2 2012-04-12   7   6   1
3 2012-04-13   2   16   7
4 2012-04-14   4   17   7
# 列选择
In [10]: data[['rnd_1', 'rnd_3']]
Out[10]: 
   rnd_1 rnd_3
0    8   12
1    1   3
2    7   1
3    2   7
4    4   7
5    12   8
6    2   12
7    9   8
8    13   17
9    4   7
10   14   14
11   19   16
12    2   12
13   15   18
14   13   18
15   13   11
16   17   7
17   14   10
18    9   6
19   11   15
20   16   13
21   18   9
22    1   18
23    4   3
24    6   11
25    2   13
26    7   17
27   11   8
28    3   12
29    4   2
..   ...  ...
970   8   14
971   19   5
972   13   2
973   8   10
974   8   17
975   6   16
976   3   2
977   12   6
978   12   10
979   15   13
980   8   4
981   17   3
982   1   17
983   11   5
984   7   7
985   13   14
986   6   19
987   13   9
988   3   15
989   19   6
990   7   11
991   11   7
992   19   12
993   2   15
994   10   4
995   14   13
996   12   11
997   11   15
998   17   14
999   3   8
[1000 rows x 2 columns]
# 区块选择
In [11]: data[:7][['rnd_1', 'rnd_2']]
Out[11]: 
  rnd_1 rnd_2
0   8   17
1   1   16
2   7   6
3   2   16
4   4   17
5   12   19
6   2   7

不过对于多列选择，不能像行选择时一样使用1：5这样的方法来选择。

In [12]: data[['rnd_1':'rnd_3']]
 File "<ipython-input-13-6291b6a83eb0>", line 1
  data[['rnd_1':'rnd_3']]
         ^
SyntaxError: invalid syntax

loc

loc可以让你按照索引来进行行列选择。

In [13]: data.loc[1:5]
Out[13]: 
    fecha rnd_1 rnd_2 rnd_3
1 2012-04-11   1   16   3
2 2012-04-12   7   6   1
3 2012-04-13   2   16   7
4 2012-04-14   4   17   7
5 2012-04-15   12   19   8

这里需要注意的是，loc与第一种方法不同之处在于会把第5行也选择进去，而第一种方法只会选择到第4行为止。

data.loc[2:4, ['rnd_2', 'fecha']]
Out[14]: 
  rnd_2   fecha
2   6 2012-04-12
3   16 2012-04-13
4   17 2012-04-14

loc能够选择在两个特定日期之间的数据，需要注意的是这两个日期必须都要在索引中。

In [15]: data_fecha = data.set_index('fecha')
  ...: data_fecha.head()
Out[15]: 
      rnd_1 rnd_2 rnd_3
fecha             
2012-04-10   8   17   12
2012-04-11   1   16   3
2012-04-12   7   6   1
2012-04-13   2   16   7
2012-04-14   4   17   7
In [16]: # 生成两个特定日期
  ...: fecha_1 = dt.datetime(2013, 4, 14)
  ...: fecha_2 = dt.datetime(2013, 4, 18)
  ...: 
  ...: # 生成切片数据
  ...: data_fecha.loc[fecha_1: fecha_2]
Out[16]: 
      rnd_1 rnd_2 rnd_3
fecha             
2013-04-14   17   10   5
2013-04-15   14   4   9
2013-04-16   1   2   18
2013-04-17   9   15   1
2013-04-18   16   7   17

更新：如果没有特殊需求，强烈建议使用loc而尽量少使用[]，因为loc在对DataFrame进行重新赋值操作时会避免chained indexing问题，使用[]时编译器很可能会给出SettingWithCopy的警告。

具体可以参见官方文档：http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

iloc

如果说loc是按照索引（index）的值来选取的话，那么iloc就是按照索引的位置来进行选取。iloc不关心索引的具体值是多少，只关心位置是多少，所以使用iloc时方括号中只能使用数值。

# 行选择
In [17]: data_fecha[10: 15]
Out[17]: 
      rnd_1 rnd_2 rnd_3
fecha             
2012-04-20   14   6   14
2012-04-21   19   14   16
2012-04-22   2   6   12
2012-04-23   15   8   18
2012-04-24   13   8   18
# 列选择
In [18]: data_fecha.iloc[:,[1,2]].head()
Out[18]: 
      rnd_2 rnd_3
fecha          
2012-04-10   17   12
2012-04-11   16   3
2012-04-12   6   1
2012-04-13   16   7
2012-04-14   17   7
# 切片选择
In [19]: data_fecha.iloc[[1,12,34],[0,2]]
Out[19]: 
      rnd_1 rnd_3
fecha          
2012-04-11   1   3
2012-04-22   2   12
2012-05-14   17   10

at的使用方法与loc类似，但是比loc有更快的访问数据的速度，而且只能访问单个元素，不能访问多个元素。

In [20]: timeit data_fecha.at[fecha_1,'rnd_1']
The slowest run took 3783.11 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.3 µs per loop
In [21]: timeit data_fecha.loc[fecha_1,'rnd_1']
The slowest run took 121.24 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 192 µs per loop
In [22]: data_fecha.at[fecha_1,'rnd_1']
Out[22]: 17

iat

iat对于iloc的关系就像at对于loc的关系，是一种更快的基于索引位置的选择方法，同at一样只能访问单个元素。

In [23]: data_fecha.iat[1,0]
Out[23]: 1
In [24]: timeit data_fecha.iat[1,0]
The slowest run took 6.23 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 8.77 µs per loop
In [25]: timeit data_fecha.iloc[1,0]
10000 loops, best of 3: 158 µs per loop

以上说过的几种方法都要求查询的秩在索引中，或者位置不超过长度范围，而ix允许你得到不在DataFrame索引中的数据。

In [28]: date_1 = dt.datetime(2013, 1, 10, 8, 30)
  ...: date_2 = dt.datetime(2013, 1, 13, 4, 20)
  ...: 
  ...: # 生成切片数据
  ...: data_fecha.ix[date_1: date_2]
Out[28]: 
      rnd_1 rnd_2 rnd_3
fecha             
2013-01-11   19   17   19
2013-01-12   10   9   17
2013-01-13   15   3   10

如上面的例子所示，2013年1月10号并没有被选择进去，因为这个时间点被看作为0点0分，比8点30分要早一些。

以上这篇浅谈pandas中Dataframe的查询方法([], loc, iloc, at, iat, ix)就是小编分享给大家的全部内容了，希望能给大家一个参考，也希望大家多多支持三水点靠木。

浅谈pandas中Dataframe的查询方法([], loc, iloc, at, iat, ix)

- Author -

ForeseeMark

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python 探针的实现原理

Apr 23 Python

用Pygal绘制直方图代码示例

Dec 07 Python

python顺序的读取文件夹下名称有序的文件方法

Jul 11 Python

详解Python中的type和object

Aug 15 Python

Python rstrip()方法实例详解

Nov 11 Python

python常用库之NumPy和sklearn入门

Jul 11 Python

利用Python实现手机短信监控通知的方法

Jul 22 Python

基于python实现的百度音乐下载器python pyqt改进版(附代码)

Aug 05 Python

python操作openpyxl导出Excel 设置单元格格式及合并处理代码实例

Aug 27 Python

如何定义TensorFlow输入节点

Jan 23 Python

Python3 io文本及原始流I/O工具用法详解

Mar 23 Python

关于Python3爬虫利器Appium的安装步骤

Jul 29 Python

python pandas dataframe 行列选择,切片操作方法

Apr 10 #Python

python3下实现搜狗AI API的代码示例

Apr 10 #Python

Python基于pycrypto实现的AES加密和解密算法示例

Apr 10 #Python

浅谈Pandas中map, applymap and apply的区别

Apr 10 #Python

对pandas中apply函数的用法详解

Apr 10 #Python

Python 25行代码实现的RSA算法详解

Apr 10 #Python

使用pandas中的DataFrame数据绘制柱状图的方法

Apr 10 #Python

You might like

PHP读取文件的常见几种方法

2016/11/03 PHP

让whoops帮我们告别ThinkPHP6的异常页面

2020/03/02 PHP

jQuery 使用手册(三)

2009/09/23 Javascript

JQuery扩展插件Validate 5添加自定义验证方法

2011/09/05 Javascript

使用Grunt.js管理你项目的应用说明

2013/04/24 Javascript

Javascript获取HTML静态页面参数传递值示例

2013/08/18 Javascript

Jil,高效的json序列化和反序列化库

2017/02/15 Javascript

jQueryeasyui 中如何使用datetimebox 取两个日期间相隔的天数

2017/06/13 jQuery

详解node.js的http模块实例演示

2018/07/12 Javascript

Vue下拉框回显并默认选中随机问题

2018/09/06 Javascript

jQuery实现每日秒杀商品倒计时功能

2019/09/06 jQuery

微信小程序上传图片并等比列压缩到指定大小的实例代码

2019/10/24 Javascript

vue 实现v-for循环回来的数据动态绑定id

2019/11/07 Javascript

[01:07:20]DOTA2-DPC中国联赛正赛 Dynasty vs XG BO3 第二场 2月2日

2021/03/11 DOTA

使用Python读写文本文件及编写简单的文本编辑器

2016/03/11 Python

Python之日期与时间处理模块（date和datetime）

2017/02/16 Python

python安装oracle扩展及数据库连接方法

2017/02/21 Python

python 运用Django 开发后台接口的实例

2018/12/11 Python

python实现桌面壁纸切换功能

2019/01/21 Python

PyQt QListWidget修改列表项item的行高方法

2019/06/20 Python

详解python中index()、find()方法

2019/08/29 Python

解决pycharm安装第三方库失败的问题

2020/05/09 Python

Python使用for生成列表实现过程解析

2020/09/22 Python

纯css3实现的动画按钮的实例教程

2014/11/17 HTML / CSS

美国汽车零部件和配件网站：CarParts

2019/03/13 全球购物

美国家居装饰购物网站：Amanda Lindroth

2020/03/25 全球购物

高中生学习生活的自我评价

2013/10/09 职场文书

人事经理岗位职责

2014/04/28 职场文书

建筑施工安全责任书

2014/07/24 职场文书

2014教师“四风问题”对照检查材料思想汇报

2014/09/16 职场文书

迎国庆横幅标语

2014/10/08 职场文书

幸福终点站观后感

2015/06/04 职场文书

电视新闻稿

2015/07/17 职场文书

MySQL下使用Inplace和Online方式创建索引的教程

2021/05/26 MySQL

关于JavaScript轮播图的实现

2021/11/20 Javascript

java获取一个文本文件的编码(格式)信息

2022/09/23 Java/Android