python正向最大匹配分词和逆向最大匹配分词的实例


Posted in Python onNovember 14, 2018

正向最大匹配

# -*- coding:utf-8 -*-
 
CODEC='utf-8'
 
def u(s, encoding):
  'converted other encoding to unicode encoding'
  if isinstance(s, unicode):
    return s
  else:
    return unicode(s, encoding)
 
def fwd_mm_seg(wordDict, maxLen, str):
  'forward max match segment'
  wordList = []
  segStr = str
  segStrLen = len(segStr)
  for word in wordDict:
    print 'word: ', word
  print "\n"
  while segStrLen > 0:
    if segStrLen > maxLen:
      wordLen = maxLen
    else:
      wordLen = segStrLen
    subStr = segStr[0:wordLen]
    print "subStr: ", subStr
    while wordLen > 1:
      if subStr in wordDict:
        print "subStr1: %r" % subStr
        break
      else:
        print "subStr2: %r" % subStr
        wordLen = wordLen - 1
        subStr = subStr[0:wordLen]
#      print "subStr3: ", subStr
    wordList.append(subStr)
    segStr = segStr[wordLen:]
    segStrLen = segStrLen - wordLen
  for wordstr in wordList:
    print "wordstr: ", wordstr
  return wordList
    
      
def main():
  fp_dict = open('words.dic')
  wordDict = {}
  for eachWord in fp_dict:
    wordDict[u(eachWord.strip(), 'utf-8')] = 1
  segStr = u'你好世界hello world'
  print segStr
  wordList = fwd_mm_seg(wordDict, 10, segStr)
  print "==".join(wordList)
  
 
if __name__ == '__main__':
  main()

逆向最大匹配

# -*- coding:utf-8 -*-
 
 
def u(s, encoding):
  'converted other encoding to unicode encoding'
  if isinstance(s, unicode):
    return s
  else:
    return unicode(s, encoding)
 
CODEC='utf-8'
 
def bwd_mm_seg(wordDict, maxLen, str):
  'forward max match segment'
  wordList = []
  segStr = str
  segStrLen = len(segStr)
  for word in wordDict:
    print 'word: ', word
  print "\n"
  while segStrLen > 0:
    if segStrLen > maxLen:
      wordLen = maxLen
    else:
      wordLen = segStrLen
    subStr = segStr[-wordLen:None]
    print "subStr: ", subStr
    while wordLen > 1:
      if subStr in wordDict:
        print "subStr1: %r" % subStr
        break
      else:
        print "subStr2: %r" % subStr
        wordLen = wordLen - 1
        subStr = subStr[-wordLen:None]
#      print "subStr3: ", subStr
    wordList.append(subStr)
    segStr = segStr[0: -wordLen]
    segStrLen = segStrLen - wordLen
  wordList.reverse()
  for wordstr in wordList:
    print "wordstr: ", wordstr
  return wordList
    
      
def main():
  fp_dict = open('words.dic')
  wordDict = {}
  for eachWord in fp_dict:
    wordDict[u(eachWord.strip(), 'utf-8')] = 1
  segStr = ur'你好世界hello world'
  print segStr
  wordList = bwd_mm_seg(wordDict, 10, segStr)
  print "==".join(wordList)
 
if __name__ == '__main__':
  main()

以上这篇python正向最大匹配分词和逆向最大匹配分词的实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
Python生成随机验证码的两种方法
Dec 22 Python
Python3之文件读写操作的实例讲解
Jan 23 Python
Php多进程实现代码
May 07 Python
Redis使用watch完成秒杀抢购功能的代码
May 07 Python
对python的unittest架构公共参数token提取方法详解
Dec 17 Python
Python实现统计英文文章词频的方法分析
Jan 28 Python
Python二叉树的镜像转换实现方法示例
Mar 06 Python
python实现小球弹跳效果
May 10 Python
python 正则表达式贪婪模式与非贪婪模式原理、用法实例分析
Oct 14 Python
jupyter notebook 恢复误删单元格或者历史代码的实现
Apr 17 Python
QML用PathView实现轮播图
Jun 03 Python
python在地图上画比例的实例详解
Nov 13 Python
对python中的乘法dot和对应分量相乘multiply详解
Nov 14 #Python
在python中实现对list求和及求积
Nov 14 #Python
python 统计一个列表当中的每一个元素出现了多少次的方法
Nov 14 #Python
Python 实现两个列表里元素对应相乘的方法
Nov 14 #Python
python将一组数分成每3个一组的实例
Nov 14 #Python
Python中实现单例模式的n种方式和原理
Nov 14 #Python
解决Python print输出不换行没空格的问题
Nov 14 #Python
You might like
How do I change MySQL timezone?
2008/03/26 PHP
phpmyadmin MySQL 加密配置方法
2009/07/05 PHP
搭建PhpStorm+PhpStudy开发环境的超详细教程
2020/09/17 PHP
JavaScript函数、方法、对象代码
2008/10/29 Javascript
clientX,pageX,offsetX,x,layerX,screenX,offsetLeft区别分析
2010/03/12 Javascript
JavaScript Math 对象常用方法总结
2016/04/28 Javascript
自定义类似于jQuery UI Selectable 的Vue指令v-selectable
2017/08/23 jQuery
layer实现关闭弹出层刷新父界面功能详解
2017/11/15 Javascript
vue 过滤器filter实例详解
2018/03/14 Javascript
js jquery 获取某一元素到浏览器顶端的距离实现方法
2018/09/05 jQuery
微信小程序日历组件使用方法详解
2018/12/29 Javascript
vue单页面在微信下只能分享落地页的解决方案
2019/04/15 Javascript
Vue自定义指令结合阿里云OSS优化图片的实现方法
2019/11/12 Javascript
JavaScript原型继承和原型链原理详解
2020/02/04 Javascript
node.js中 mysql 增删改查操作及async,await处理实例分析
2020/02/11 Javascript
JavaScript实现点击切换验证码及校验
2021/01/10 Javascript
Python中selenium实现文件上传所有方法整理总结
2017/04/01 Python
Python求解任意闭区间的所有素数
2018/06/10 Python
用Python实现将一张图片分成9宫格的示例
2019/07/05 Python
django settings.py 配置文件及介绍
2019/07/15 Python
python实现简单的tcp 文件下载
2020/09/16 Python
html+css实现自定义图片上传按钮功能
2019/09/04 HTML / CSS
美国鞋类购物网站:Shiekh Shoes
2016/08/21 全球购物
全球虚拟主机商:HostGator
2017/02/06 全球购物
大码女装:Ulla Popken
2019/08/06 全球购物
锐步英国官网:Reebok英国
2019/11/29 全球购物
卫校护理专业毕业生求职信
2013/11/26 职场文书
会计系中文个人求职信
2013/12/24 职场文书
单位消防安全制度
2014/01/12 职场文书
2014年学校体育工作总结
2014/12/08 职场文书
2014年信访维稳工作总结
2014/12/08 职场文书
先进个人事迹材料
2014/12/29 职场文书
辛亥革命观后感
2015/06/02 职场文书
Redis 彻底禁用RDB持久化操作
2021/07/09 Redis
在CSS中使用when/else的方法
2022/01/18 HTML / CSS
【海涛dota解说】一房久违的影魔魂守二连发
2022/04/01 DOTA