python正向最大匹配分词和逆向最大匹配分词的实例


Posted in Python onNovember 14, 2018

正向最大匹配

# -*- coding:utf-8 -*-
 
CODEC='utf-8'
 
def u(s, encoding):
  'converted other encoding to unicode encoding'
  if isinstance(s, unicode):
    return s
  else:
    return unicode(s, encoding)
 
def fwd_mm_seg(wordDict, maxLen, str):
  'forward max match segment'
  wordList = []
  segStr = str
  segStrLen = len(segStr)
  for word in wordDict:
    print 'word: ', word
  print "\n"
  while segStrLen > 0:
    if segStrLen > maxLen:
      wordLen = maxLen
    else:
      wordLen = segStrLen
    subStr = segStr[0:wordLen]
    print "subStr: ", subStr
    while wordLen > 1:
      if subStr in wordDict:
        print "subStr1: %r" % subStr
        break
      else:
        print "subStr2: %r" % subStr
        wordLen = wordLen - 1
        subStr = subStr[0:wordLen]
#      print "subStr3: ", subStr
    wordList.append(subStr)
    segStr = segStr[wordLen:]
    segStrLen = segStrLen - wordLen
  for wordstr in wordList:
    print "wordstr: ", wordstr
  return wordList
    
      
def main():
  fp_dict = open('words.dic')
  wordDict = {}
  for eachWord in fp_dict:
    wordDict[u(eachWord.strip(), 'utf-8')] = 1
  segStr = u'你好世界hello world'
  print segStr
  wordList = fwd_mm_seg(wordDict, 10, segStr)
  print "==".join(wordList)
  
 
if __name__ == '__main__':
  main()

逆向最大匹配

# -*- coding:utf-8 -*-
 
 
def u(s, encoding):
  'converted other encoding to unicode encoding'
  if isinstance(s, unicode):
    return s
  else:
    return unicode(s, encoding)
 
CODEC='utf-8'
 
def bwd_mm_seg(wordDict, maxLen, str):
  'forward max match segment'
  wordList = []
  segStr = str
  segStrLen = len(segStr)
  for word in wordDict:
    print 'word: ', word
  print "\n"
  while segStrLen > 0:
    if segStrLen > maxLen:
      wordLen = maxLen
    else:
      wordLen = segStrLen
    subStr = segStr[-wordLen:None]
    print "subStr: ", subStr
    while wordLen > 1:
      if subStr in wordDict:
        print "subStr1: %r" % subStr
        break
      else:
        print "subStr2: %r" % subStr
        wordLen = wordLen - 1
        subStr = subStr[-wordLen:None]
#      print "subStr3: ", subStr
    wordList.append(subStr)
    segStr = segStr[0: -wordLen]
    segStrLen = segStrLen - wordLen
  wordList.reverse()
  for wordstr in wordList:
    print "wordstr: ", wordstr
  return wordList
    
      
def main():
  fp_dict = open('words.dic')
  wordDict = {}
  for eachWord in fp_dict:
    wordDict[u(eachWord.strip(), 'utf-8')] = 1
  segStr = ur'你好世界hello world'
  print segStr
  wordList = bwd_mm_seg(wordDict, 10, segStr)
  print "==".join(wordList)
 
if __name__ == '__main__':
  main()

以上这篇python正向最大匹配分词和逆向最大匹配分词的实例就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持三水点靠木。

Python 相关文章推荐
Python高级应用实例对比:高效计算大文件中的最长行的长度
Jun 08 Python
Python中分数的相关使用教程
Mar 30 Python
Python实现嵌套列表去重方法示例
Dec 28 Python
TensorFlow平台下Python实现神经网络
Mar 10 Python
深入浅析Python传值与传址
Jul 10 Python
Python后台开发Django会话控制的实现
Apr 15 Python
应用OpenCV和Python进行SIFT算法的实现详解
Aug 21 Python
python读取tif图片时保留其16bit的编码格式实例
Jan 13 Python
python构造函数init实例方法解析
Jan 19 Python
使用Python操作ArangoDB的方法步骤
Feb 02 Python
python3.8.1+selenium实现登录滑块验证功能
May 22 Python
聊一聊python常用的编程模块
May 14 Python
对python中的乘法dot和对应分量相乘multiply详解
Nov 14 #Python
在python中实现对list求和及求积
Nov 14 #Python
python 统计一个列表当中的每一个元素出现了多少次的方法
Nov 14 #Python
Python 实现两个列表里元素对应相乘的方法
Nov 14 #Python
python将一组数分成每3个一组的实例
Nov 14 #Python
Python中实现单例模式的n种方式和原理
Nov 14 #Python
解决Python print输出不换行没空格的问题
Nov 14 #Python
You might like
简单谈谈php浮点数精确运算
2016/03/10 PHP
Smarty环境配置与使用入门教程
2016/05/11 PHP
PHP小偷程序的设计与实现方法详解
2016/10/15 PHP
js chrome浏览器判断代码
2010/03/28 Javascript
详谈 Jquery Ajax异步处理Json数据.
2011/09/09 Javascript
使用jquery实现图文切换效果另加特效
2013/01/20 Javascript
JQuery插件ajaxfileupload.js异步上传文件实例
2015/05/19 Javascript
JS实现的自定义右键菜单实例二则
2015/09/01 Javascript
JS实现图片点击后出现模态框效果
2017/05/03 Javascript
js实现水平滚动菜单导航
2017/07/21 Javascript
node.js基础知识小结
2018/02/26 Javascript
webpack4之SplitChunksPlugin使用指南
2018/06/12 Javascript
js实现按钮开关单机下拉菜单效果
2018/11/22 Javascript
详解微信小程序文件下载--视频和图片
2019/04/24 Javascript
JS+CSS+HTML实现“代码雨”类似黑客帝国文字下落效果
2020/03/17 Javascript
js实现无刷新监听URL的变化示例代码详解
2020/06/03 Javascript
js实现鼠标滑动到某个div禁止滚动
2020/09/17 Javascript
nuxt 实现在其它js文件中使用store的方式
2020/11/05 Javascript
vuex的使用步骤
2021/01/06 Vue.js
python中使用urllib2伪造HTTP报头的2个方法
2014/07/07 Python
Eclipse和PyDev搭建完美Python开发环境教程(Windows篇)
2016/11/16 Python
Python 遍历列表里面序号和值的方法(三种)
2017/02/17 Python
如何基于Python批量下载音乐
2019/11/11 Python
Python csv文件记录流程代码解析
2020/07/16 Python
北美女性服装零售连锁店:maurices
2019/06/12 全球购物
《傅雷家书》教学反思
2014/04/20 职场文书
车辆工程专业求职信
2014/06/14 职场文书
干部作风建设年活动剖析材料
2014/10/23 职场文书
四年级数学上册教学计划
2015/01/20 职场文书
班主任高考寄语
2015/02/26 职场文书
2015年餐厅服务员工作总结
2015/04/23 职场文书
大学生活感想
2015/08/10 职场文书
HTML基础详解(上)
2021/10/16 HTML / CSS
如何用vue实现网页截图你知道吗
2021/11/17 Vue.js
浅谈为什么我的 z-index 又不生效了
2022/07/15 HTML / CSS
pytorch实现加载保存查看checkpoint文件
2022/07/15 Python