编程 Python

pandas中pd.groupby()的用法详解

Posted in Python onJune 16, 2022

在pandas中的groupby和在sql语句中的groupby有异曲同工之妙，不过也难怪，毕竟关系数据库中的存放数据的结构也是一张大表罢了，与dataframe的形式相似。

import numpy as np
import pandas as pd
from pandas import Series, DataFrame


df = pd.read_csv('./city_weather.csv')
print(df)
'''
          date city  temperature  wind
0   03/01/2016   BJ            8     5
1   17/01/2016   BJ           12     2
2   31/01/2016   BJ           19     2
3   14/02/2016   BJ           -3     3
4   28/02/2016   BJ           19     2
5   13/03/2016   BJ            5     3
6   27/03/2016   SH           -4     4
7   10/04/2016   SH           19     3
8   24/04/2016   SH           20     3
9   08/05/2016   SH           17     3
10  22/05/2016   SH            4     2
11  05/06/2016   SH          -10     4
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5
14  17/07/2016   GZ           10     2
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
17  28/08/2016   GZ           25     4
18  11/09/2016   SZ           20     1
19  25/09/2016   SZ          -10     4
'''

g = df.groupby(df['city'])
# <pandas.core.groupby.groupby.DataFrameGroupBy object at 0x7f10450e12e8>

print(g.groups)

# {'BJ': Int64Index([0, 1, 2, 3, 4, 5], dtype='int64'),
# 'GZ': Int64Index([14, 15, 16, 17], dtype='int64'),
# 'SZ': Int64Index([18, 19], dtype='int64'),
# 'SH': Int64Index([6, 7, 8, 9, 10, 11, 12, 13], dtype='int64')}

print(g.size()) # g.size() 可以统计每个组 成员的 数量
'''
city
BJ    6
GZ    4
SH    8
SZ    2
dtype: int64
'''

print(g.get_group('BJ')) # 得到 某个 分组
'''
         date city  temperature  wind
0  03/01/2016   BJ            8     5
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
3  14/02/2016   BJ           -3     3
4  28/02/2016   BJ           19     2
5  13/03/2016   BJ            5     3
'''

df_bj = g.get_group('BJ')
print(df_bj.mean()) # 对这个 分组 求平均
'''
temperature    10.000000
wind            2.833333
dtype: float64
'''

# 直接使用 g 对象，求平均值
print(g.mean()) # 对 每一个 分组， 都计算分组
'''
      temperature      wind
city                       
BJ         10.000  2.833333
GZ          8.750  4.000000
SH          4.625  3.625000
SZ          5.000  2.500000
'''

print(g.max())
'''
            date  temperature  wind
city                               
BJ    31/01/2016           19     5
GZ    31/07/2016           25     5
SH    27/03/2016           20     5
SZ    25/09/2016           20     4
'''

print(g.min())
'''
            date  temperature  wind
city                               
BJ    03/01/2016           -3     2
GZ    14/08/2016           -1     2
SH    03/07/2016          -10     2
SZ    11/09/2016          -10     1
'''

# g 对象还可以使用 for 进行循环遍历
for name, group in g:
    print(name)
    print(group)


# g 可以转化为 list类型， dict类型
print(list(g)) # 元组第一个元素是 分组的label，第二个是dataframe
'''
[('BJ',          date city  temperature  wind
0  03/01/2016   BJ            8     5
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
3  14/02/2016   BJ           -3     3
4  28/02/2016   BJ           19     2
5  13/03/2016   BJ            5     3), 
('GZ',           date city  temperature  wind
14  17/07/2016   GZ           10     2
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
17  28/08/2016   GZ           25     4), 
('SH',           date city  temperature  wind
6   27/03/2016   SH           -4     4
7   10/04/2016   SH           19     3
8   24/04/2016   SH           20     3
9   08/05/2016   SH           17     3
10  22/05/2016   SH            4     2
11  05/06/2016   SH          -10     4
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5), 
('SZ',           date city  temperature  wind
18  11/09/2016   SZ           20     1
19  25/09/2016   SZ          -10     4)]
'''
print(dict(list(g))) # 返回键值对，值的类型是 dataframe
'''
{'SH':           date city  temperature  wind
6   27/03/2016   SH           -4     4
7   10/04/2016   SH           19     3
8   24/04/2016   SH           20     3
9   08/05/2016   SH           17     3
10  22/05/2016   SH            4     2
11  05/06/2016   SH          -10     4
12  19/06/2016   SH            0     5
13  03/07/2016   SH           -9     5, 
'SZ':           date city  temperature  wind
18  11/09/2016   SZ           20     1
19  25/09/2016   SZ          -10     4, 
'GZ':           date city  temperature  wind
14  17/07/2016   GZ           10     2
15  31/07/2016   GZ           -1     5
16  14/08/2016   GZ            1     5
17  28/08/2016   GZ           25     4, 
'BJ':          date city  temperature  wind
0  03/01/2016   BJ            8     5
1  17/01/2016   BJ           12     2
2  31/01/2016   BJ           19     2
3  14/02/2016   BJ           -3     3
4  28/02/2016   BJ           19     2
5  13/03/2016   BJ            5     3}
'''

到此这篇关于pandas中pd.groupby()的用法详解的文章就介绍到这了,更多相关pandas pd.groupby()内容请搜索三水点靠木以前的文章或继续浏览下面的相关文章希望大家以后多多支持三水点靠木！

- Author -

我是小蚂蚁

- Original Sources -

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Tags in this post...

pandas(7)

Python 相关文章推荐

在Python中使用异步Socket编程性能测试

Jun 25 Python

python使用in操作符时元组和数组的区别分析

May 19 Python

使用Python读写文本文件及编写简单的文本编辑器

Mar 11 Python

python实现ip地址查询经纬度定位详解

Aug 30 Python

Python zip函数打包元素实例解析

Dec 11 Python

python 实现list或string按指定分段

Dec 25 Python

对tensorflow中的strides参数使用详解

Jan 04 Python

详解基于Jupyter notebooks采用sklearn库实现多元回归方程编程

Mar 25 Python

Django def clean()函数对表单中的数据进行验证操作

Jul 09 Python

写好Python代码的几条重要技巧

May 21 Python

总结Python连接CS2000的详细步骤

Jun 23 Python

python机器学习创建基于规则聊天机器人过程示例详解

Nov 02 Python

python中pd.cut()与pd.qcut()的对比及示例

Jun 16 #Python

Python自动操作神器PyAutoGUI的使用教程

Jun 16 #Python

python内置模块之上下文管理contextlib

Jun 14 #Python

Python时间操作之pytz模块使用详解

Django框架之路由用法

Jun 10 #Python

深入理解pytorch库的dockerfile

Jun 10 #Python

如何利用python实现列表嵌套字典取值

Jun 10 #Python

音频(1) 矩阵(1) 读取(1) mp3Play(1) bs4(1) translation(2) 网页解析(1) 数组(5) 数据保存(1) pandas(7)

You might like

在任意字符集下正常显示网页的方法一

2007/04/01 PHP

Smarty中调用FCKeditor的方法

2014/10/27 PHP

AJAX的使用方法详解

2017/04/29 PHP

extjs两个tbar问题探讨

2013/08/08 Javascript

下拉列表select 由左边框移动到右边示例

2013/12/04 Javascript

JavaScript知识点整理

2015/12/09 Javascript

Vue.js 递归组件实现树形菜单（实例分享）

2016/12/21 Javascript

jQuery实现Table表格隔行变色及高亮显示当前选择行效果示例

2017/02/14 Javascript

jQuery实现多张图片上传预览（不经过后端处理）

2017/04/29 jQuery

利用node.js如何创建子进程详解

2017/12/09 Javascript

angular4中*ngFor不能对返回来的对象进行循环的解决方法

2018/09/12 Javascript

vue cli安装使用less的教程详解

2019/07/12 Javascript

详解小程序BackgroundAudioManager踩坑之旅

2019/12/08 Javascript

[03:01]完美盛典趣味短片 DOTA2年度最佳&拉胯英雄

2019/12/07 DOTA

探究python中open函数的使用

2016/03/01 Python

实例解析Python设计模式编程之桥接模式的运用

2016/03/02 Python

浅谈python中的实例方法、类方法和静态方法

2017/02/17 Python

Python基于matplotlib实现绘制三维图形功能示例

2018/01/18 Python

python pandas 对series和dataframe的重置索引reindex方法

2018/06/07 Python

python写入数据到csv或xlsx文件的3种方法

2019/08/23 Python

pandas中的数据去重处理的实现方法

2020/02/10 Python

计算Python Numpy向量之间的欧氏距离实例

2020/05/22 Python

python使用openpyxl操作excel的方法步骤

2020/05/28 Python

如何解决pycharm调试报错的问题

2020/08/06 Python

肯尼亚网上商城：Kilimall

2016/08/20 全球购物

荷兰多品牌网上鞋店：Stoute Schoenen

2017/08/24 全球购物

开发中都用到了那些设计模式?用在什么场合?

2014/08/21 面试题

师德学习感言

2014/01/31 职场文书

手机银行营销方案

2014/03/14 职场文书

节电标语大全

2014/06/23 职场文书

单位实习工作证明怎么写

2014/11/02 职场文书

2014年安全生产工作总结

2014/11/13 职场文书

2015年乡镇发展党员工作总结

2015/03/31 职场文书

小学教育见习总结

2015/06/23 职场文书

python获取淘宝服务器时间的代码示例

2021/04/22 Python

JS前端使用canvas实现扩展物体类和事件派发

2022/08/05 Javascript