编程 Python

python使用正则表达式去除中文文本多余空格，保留英文之间空格方法详解

Posted in Python onFebruary 11, 2020

在pdf转为文本的时候，经常会多出空格，影响数据观感，因此需要去掉文本中多余的空格，而文本中的英文之间的正常空格需要保留，输入输出如下：

input：我今天赚了 10 个亿，老百姓very happy。

output：我今天赚了10个亿，老百姓very happy。

代码

def clean_space(text):
  """"
  处理多余的空格
  """
  match_regex = re.compile(u'[\u4e00-\u9fa5。\.,，:：《》、\(\)（）]{1} +(?<![a-zA-Z])|\d+ +| +\d+|[a-z A-Z]+')
  should_replace_list = match_regex.findall(text)
  order_replace_list = sorted(should_replace_list,key=lambda i:len(i),reverse=True)
  for i in order_replace_list:
    if i == u' ':
      continue
    new_i = i.strip()
    text = text.replace(i,new_i)
  return text

python去除英文单词之间多余的空格

re.sub(" +", " ", s)

import re 

s = "     info has been found (+/- 100 pages, and 4.5 mb of .pdf files) now i have to wait untill our team leader has processed it and learns html.     "
re.sub(" +", " ", s)

' '.join(s.split())

s = "     info has been found (+/- 100 pages, and 4.5 mb of .pdf files) now i have to wait untill our team leader has processed it and learns html.     "

s = ' '.join(s.split())
s

更多关于python使用正则表达式去除多余空格方法请查看下面的相关链接

python使用正则表达式去除中文文本多余空格，保留英文之间空格方法详解

- Author -

六神就是我

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

Python中的类学习笔记

Sep 23 Python

基于Python的XSS测试工具XSStrike使用方法

Jul 29 Python

Python实现迭代时使用索引的方法示例

Jun 05 Python

python 实现将字典dict、列表list中的中文正常显示方法

Jul 06 Python

使用python3构建文件传输的方法

Feb 13 Python

Python利用字典破解WIFI密码的方法

Feb 27 Python

python学习--使用QQ邮箱发送邮件代码实例

Apr 16 Python

Python 实现微信防撤回功能

Apr 29 Python

Python 类的私有属性和私有方法实例分析

Sep 29 Python

Jupyter Notebook输出矢量图实例

Apr 14 Python

Python内置方法和属性应用：反射和单例(推荐)

Jun 19 Python

python 爬虫爬取京东ps4售卖情况

Dec 18 Python

python 函数中的参数类型

Feb 11 #Python

python正则过滤字母、中文、数字及特殊字符方法详解

Feb 11 #Python

python3正则模块re的使用方法详解

Feb 11 #Python

Python版中国省市经纬度

Feb 11 #Python

Python终端输出彩色字符方法详解

Feb 11 #Python

Python连接Oracle之环境配置、实例代码及报错解决方法详解

Feb 11 #Python

利用python中集合的唯一性实现去重

Feb 11 #Python

You might like

php 接口类与抽象类的实际作用

2009/11/26 PHP

smarty中先strip_tags过滤html标签后truncate截取文章运用

2010/10/25 PHP

基于php split()函数的用法详解

2013/06/05 PHP

sae使用smarty模板的方法

2013/12/17 PHP

php基础设计模式大全(注册树模式、工厂模式、单列模式)

2015/08/31 PHP

点击广告后才能获得下载地址

2006/10/26 Javascript

js prototype截取字符串函数

2010/04/01 Javascript

Bootstrap实现水平排列的表单

2016/07/04 Javascript

网页挂马方式整理及详细介绍

2016/11/03 Javascript

js微信扫描二维码登录网站技术原理

2016/12/01 Javascript

js/jquery控制页面动态加载数据滑动滚动条自动加载事件的方法

2017/02/08 Javascript

ES6学习笔记之Set和Map数据结构详解

2017/04/07 Javascript

jQuery UI Draggable + Sortable 结合使用(实例讲解)

2017/09/07 jQuery

vue-cli中使用高德地图的方法示例

2019/03/28 Javascript

bootstrap table实现横向合并与纵向合并

2019/07/18 Javascript

layui--js控制switch的切换方法

2019/09/03 Javascript

微信小程序实现星级评价

2019/11/20 Javascript

js get和post请求实现代码解析

2020/02/06 Javascript

使用Python的PEAK来适配协议的教程

2015/04/14 Python

python技能之数据导出excel的实例代码

2017/08/11 Python

Anaconda2下实现Python2.7和Python3.5的共存方法

2018/06/11 Python

pandas 按照特定顺序输出的实现代码

2018/07/10 Python

原生python实现knn分类算法

2019/10/24 Python

python实现画出e指数函数的图像

2019/11/21 Python

numpy np.newaxis 的实用分享

2019/11/30 Python

pytorch 模型的train模式与eval模式实例

2020/02/20 Python

tensorflow使用CNN分析mnist手写体数字数据集

2020/06/17 Python

python 决策树算法的实现

2020/10/09 Python

StubHub德国：购买和出售门票

2017/09/06 全球购物

美国折扣宠物药房：Total Pet Supply

2018/05/27 全球购物

新西兰便宜隐形眼镜购买网站：QUICKLENS New Zealand

2019/03/02 全球购物

日本化妆品植村秀俄罗斯官方网站：Shu Uemura俄罗斯

2020/02/01 全球购物

医学专业毕业生推荐信

2013/11/14 职场文书

机关道德讲堂实施方案

2014/03/15 职场文书

小学家庭教育心得体会

2016/01/14 职场文书

react使用antd的上传组件实现文件表单一起提交功能(完整代码)

2021/06/29 Javascript