Python统计纯文本文件中英文单词出现个数的方法总结【测试可用】


Posted in Python onJuly 25, 2018

本文实例讲述了Python统计纯文本文件中英文单词出现个数的方法。分享给大家供大家参考,具体如下:

第一版: 效率低

# -*- coding:utf-8 -*-
#!python3
path = 'test.txt'
with open(path,encoding='utf-8',newline='') as f:
  word = []
  words_dict= {}
  for letter in f.read():
    if letter.isalnum():
      word.append(letter)
    elif letter.isspace(): #空白字符 空格 \t \n
      if word:
        word = ''.join(word).lower() #转小写
        if word not in words_dict:
          words_dict[word] = 1
        else:
          words_dict[word] += 1
        word = []
#处理最后一个单词
if word:
  word = ''.join(word).lower() # 转小写
  if word not in words_dict:
    words_dict[word] = 1
  else:
    words_dict[word] += 1
  word = []
for k,v in words_dict.items():
  print(k,v)

运行结果:

we 4
are 1
busy 1
all 1
day 1
like 1
swarms 1
of 6
flies 1
without 1
souls 1
noisy 1
restless 1
unable 1
to 1
hear 1
the 7
voices 1
soul 1
as 1
time 1
goes 1
by 1
childhood 1
away 2
grew 1
up 1
years 1
a 1
lot 1
memories 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence 1
regardless 1
shackles 1
mind 1
indulge 1
in 1
world 1
buckish 1
focus 1
on 1
beneficial 1
principle 1
lost 1
themselves 1

第二版:

缺点:遇到大文件要一次读入内存,性能不好

# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path,'r',encoding='utf-8') as f:
  data = f.read()
  word_reg = re.compile(r'\w+')
  #word_reg = re.compile(r'\w+\b')
  word_list = word_reg.findall(data)
  word_list = [word.lower() for word in word_list] #转小写
  word_set = set(word_list) #避免重复查询
  # words_dict = {}
  # for word in word_set:
  #   words_dict[word] = word_list.count(word)
  # 简洁写法
  words_dict = {word: word_list.count(word) for word in word_set}
  for k,v in words_dict.items():
    print(k,v)

运行结果:

on 1
also 1
souls 1
focus 1
soul 1
time 1
noisy 1
grew 1
lot 1
childish 1
like 1
voices 1
indulge 1
swarms 1
buckish 1
restless 1
we 4
hear 1
childhood 1
as 1
world 1
themselves 1
are 1
bottom 1
memories 1
the 7
of 6
flies 1
without 1
have 2
day 1
busy 1
to 1
eroded 1
regardless 1
unable 1
innocence 1
up 1
a 1
in 1
mind 1
goes 1
by 1
lost 1
principle 1
once 1
away 2
years 1
beneficial 1
all 1
shackles 1

第三版:

# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path, 'r', encoding='utf-8') as f:
  word_list = []
  word_reg = re.compile(r'\w+')
  for line in f:
    #line_words = word_reg.findall(line)
    #比上面的正则更加简单
    line_words = line.split()
    word_list.extend(line_words)
  word_set = set(word_list) # 避免重复查询
  words_dict = {word: word_list.count(word) for word in word_set}
  for k, v in words_dict.items():
    print(k, v)

运行结果:

childhood 1
innocence, 1
are 1
of 6
also 1
lost 1
We 1
regardless 1
noisy, 1
by, 1
on 1
themselves. 1
grew 1
lot 1
bottom 1
buckish, 1
time 1
childish 1
voices 1
once 1
restless, 1
shackles 1
world 1
eroded 1
As 1
all 1
day, 1
swarms 1
we 3
soul. 1
memories, 1
in 1
without 1
like 1
beneficial 1
up, 1
unable 1
away 1
flies 1
goes 1
a 1
have 2
away, 1
mind, 1
focus 1
principle, 1
hear 1
to 1
the 7
years 1
busy 1
souls, 1
indulge 1

第四版:使用Counter统计

# -*- coding:utf-8 -*-
#!python3
import collections
import re
path = 'test.txt'
with open(path, 'r', encoding='utf-8') as f:
  word_list = []
  word_reg = re.compile(r'\w+')
  for line in f:
    line_words = line.split()
    word_list.extend(line_words)
  words_dict = dict(collections.Counter(word_list)) #使用Counter统计
  for k, v in words_dict.items():
    print(k, v)

运行结果:

We 1
are 1
busy 1
all 1
day, 1
like 1
swarms 1
of 6
flies 1
without 1
souls, 1
noisy, 1
restless, 1
unable 1
to 1
hear 1
the 7
voices 1
soul. 1
As 1
time 1
goes 1
by, 1
childhood 1
away, 1
we 3
grew 1
up, 1
years 1
away 1
a 1
lot 1
memories, 1
once 1
have 2
also 1
eroded 1
bottom 1
childish 1
innocence, 1
regardless 1
shackles 1
mind, 1
indulge 1
in 1
world 1
buckish, 1
focus 1
on 1
beneficial 1
principle, 1
lost 1
themselves. 1

注:这里使用的测试文本test.txt如下:

We are busy all day, like swarms of flies without souls, noisy, restless, unable to hear the voices of the soul. As time goes by, childhood away, we grew up, years away a lot of memories, once have also eroded the bottom of the childish innocence, we regardless of the shackles of mind, indulge in the world buckish, focus on the beneficial principle, we have lost themselves.

Python 相关文章推荐
python实现斐波那契递归函数的方法
Sep 08 Python
详细解读Python中的__init__()方法
May 02 Python
基于Python __dict__与dir()的区别详解
Oct 30 Python
利用Python批量提取Win10锁屏壁纸实战教程
Mar 27 Python
利用python对Excel中的特定数据提取并写入新表的方法
Jun 14 Python
Python中的函数式编程:不可变的数据结构
Oct 08 Python
Python 读取串口数据,动态绘图的示例
Jul 02 Python
Python 实现网课实时监控自动签到、打卡功能
Mar 12 Python
Pycharm打开已有项目配置python环境的方法
Jul 03 Python
使用Python实现微信拍一拍功能的思路代码
Jul 09 Python
python解包概念及实例
Feb 17 Python
python解析json数据
Apr 29 Python
基于DataFrame改变列类型的方法
Jul 25 #Python
对pandas中Series的map函数详解
Jul 25 #Python
基于pandas将类别属性转化为数值属性的方法
Jul 25 #Python
Django实现支付宝付款和微信支付的示例代码
Jul 25 #Python
Python走楼梯问题解决方法示例
Jul 25 #Python
python 批量修改/替换数据的实例
Jul 25 #Python
django 实现电子支付功能的示例代码
Jul 25 #Python
You might like
WINXP下apache+php4+mysql
2006/11/25 PHP
快速开发一个PHP扩展图文教程
2008/12/12 PHP
php中static静态变量的使用方法详解
2010/06/04 PHP
php中数字0和空值的区别分析
2014/06/05 PHP
递归实现php数组转xml的代码分享
2015/05/14 PHP
php判断是否为ajax请求的方法
2016/11/29 PHP
php封装的pdo数据库操作工具类与用法示例
2019/05/08 PHP
PHP使用gearman进行异步的邮件或短信发送操作详解
2020/02/27 PHP
JavaScript面向对象编程
2008/03/02 Javascript
js loading加载效果实现代码
2009/11/24 Javascript
javascript 强制刷新页面的实现代码
2009/12/13 Javascript
鼠标右击事件代码(asp.net后台)
2011/01/27 Javascript
JavaScript实现自己的DOM选择器原理及代码
2013/03/04 Javascript
jQuery中验证表单提交方式及序列化表单内容的实现
2014/01/06 Javascript
jQuery实现点击该行即可删除HTML表格行
2014/10/17 Javascript
JavaScript中用let语句声明作用域的用法讲解
2016/05/20 Javascript
基于Bootstrap实现的下拉菜单手机端不能选择菜单项的原因附解决办法
2016/07/22 Javascript
js 提交form表单和设置form表单请求路径的实现方法
2016/10/25 Javascript
Javascript 跨域知识详细介绍
2016/10/30 Javascript
phantomjs导出html到pdf的方法总结
2017/10/19 Javascript
vue中Npm run build 根据环境传递参数方法来打包不同域名
2018/03/29 Javascript
jQuery操作选中select下拉框的值代码实例
2020/02/07 jQuery
[02:09]抵达西雅图!中国军团加油!
2014/07/07 DOTA
python数据结构之二叉树的统计与转换实例
2014/04/29 Python
在Mac OS系统上安装Python的Pillow库的教程
2015/11/20 Python
PyQt5组件读取参数的实例
2019/06/25 Python
python的列表List求均值和中位数实例
2020/03/03 Python
pygame实现弹球游戏
2020/04/14 Python
Python操控mysql批量插入数据的实现方法
2020/10/27 Python
利用纯html5绘制出来的一款非常漂亮的时钟
2015/01/04 HTML / CSS
英国时尚优质的女装:Hope Fashion
2018/08/14 全球购物
斯洛伐克香水和化妆品购物网站:Parfemy-Elnino.sk
2020/01/28 全球购物
医学生就业推荐表自我鉴定
2014/03/26 职场文书
2015年社会实践个人总结
2015/03/06 职场文书
会计求职自荐信
2015/03/26 职场文书
六一儿童节主持开场白
2015/05/28 职场文书