Python 匹配文本并在其上一行追加文本


Posted in Python onMay 11, 2022

匹配文本并在其上一行追加文本

问题描述

Python匹配文本并在其上一行追加文本

test.txt

a
b
c
d
e

1.读进列表后覆盖原文件 

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
match_then_insert('test.txt', match='c', content='123')

效果

a
b
123
c
d
e

2.FileInput类

from fileinput import FileInput
def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    for line in FileInput(filename, inplace=True):  # 原地过滤
        if match in line:
            line = content + '\n' + line
        print(line, end='')  # 输出重定向到原文件
match_then_insert('test.txt', match='c', content='123')

3.seek

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()  # 逐行读取
            except IndexError:  # 超出范围则退出
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)  # 光标移动到上一行
                rest = f.read()  # 读取余下内容
                f.seek(-len(rest), 1)  # 光标移动回原位置
                f.truncate()  # 删除余下内容
                content = content + '\n'
                f.write(content.encode())  # 插入指定内容
                f.write(rest)  # 还原余下内容
                break
match_then_insert('test.txt', match='c', content='123')

对比

方案 耗时/s
读进列表后覆盖原文件 54.42
FileInput类 121.59
seek 3.53
from timeit import timeit
from fileinput import FileInput
def init_txt():
    open('test.txt', mode='w').write('\n'.join(['a', 'b', 'c', 'd', 'e']))
def f1(filename='test.txt', match='c', content='123'):
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
def f2(filename='test.txt', match='c', content='123'):
    for line in FileInput(filename, inplace=True):
        if match in line:
            line = content + '\n' + line
        print(line, end='')
def f3(filename='test.txt', match='c', content='123'):
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()
            except IndexError:
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)
                rest = f.read()
                f.seek(-len(rest), 1)
                f.truncate()
                content = content + '\n'
                f.write(content.encode())
                f.write(rest)
                break
init_txt()
print(timeit(f1, number=1000))
init_txt()
print(timeit(f2, number=1000))
init_txt()
print(timeit(f3, number=1000))

遇到的坑

报错可试试在文件头部添加

# -*- coding: utf-8 -*-

或指定 encoding='utf-8'

用正则表达式匹配文本(Python经典编程案例)

ceshi.txt文本如下:第一行为空行

爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-1
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ah_sina_com_cn,job: 28395818dbcb11e998a3f632d94e247c,pid: 88971,log: data/logs/chinabond_fast_spider/ah_sina_com_cn/28395818dbcb11e998a3f632d94e247c.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shupeidian_bjx_com_cn,job: 04738a5cdbcb11e9803172286b76aa73,pid: 34246,log: data/logs/chinabond_fast_spider/shupeidian_bjx_com_cn/04738a5cdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_sdchina_com,job: 28e8db4edbcb11e9803172286b76aa73,pid: 34324,log: data/logs/chinabond_fast_spider/news_sdchina_com/28e8db4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: hq_smm_cn,job: 4bdc3af6dbcb11e9a45522b8c8b2a9e4,pid: 111593,log: data/logs/chinabond_fast_spider/hq_smm_cn/4bdc3af6dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: sichuan_scol_com_cn,job: 71321c4edbcb11e9803172286b76aa73,pid: 34461,log: data/logs/chinabond_fast_spider/sichuan_scol_com_cn/71321c4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_mof_gov_cn,job: 7418dacedbcb11e9b15e02034af50b6e,pid: 65326,log: data/logs/chinabond_fast_spider/www_mof_gov_cn/7418dacedbcb11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-5
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_funxun_com,job: 4dcda7a0dbcb11e980a8862f09ca6d70,pid: 27785,log: data/logs/chinabond_fast_spider/www_funxun_com/4dcda7a0dbcb11e980a8862f09ca6d70.log,items: None
error_data:
爬虫任务报警
01:49:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shuidian_bjx_com_cn,job: 95090682dbcb11e9a0fade28e59e3773,pid: 106424,log: data/logs/chinabond_fast_spider/shuidian_bjx_com_cn/95090682dbcb11e9a0fade28e59e3773.log,items: None
error_data:
爬虫任务报警
01:51:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: tech_sina_com_cn,job: de4bdf72dbcb11e9a45522b8c8b2a9e4,pid: 111685,log: data/logs/chinabond_fast_spider/tech_sina_com_cn/de4bdf72dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ee_ofweek_com,job: ff6bd5b8dbcb11e9803172286b76aa73,pid: 34626,log: data/logs/chinabond_fast_spider/ee_ofweek_com/ff6bd5b8dbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: house_hexun_com,job: ff6dfdacdbcb11e9803172286b76aa73,pid: 34633,log: data/logs/chinabond_fast_spider/house_hexun_com/ff6dfdacdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_sjfzxm_com,job: 018e7d78dbcc11e9b15e02034af50b6e,pid: 65492,log: data/logs/chinabond_fast_spider/www_sjfzxm_com/018e7d78dbcc11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:53:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_xianzhaiwang_cn,job: 48d835e8dbcc11e9a0fade28e59e3773,pid: 106476,log: data/logs/chinabond_fast_spider/news_xianzhaiwang_cn/48d835e8dbcc11e9a0fade28e59e3773.log,items: None
error_data:

代码如下:

import os
import re
import json
from collections import namedtuple
alert = namedtuple('Spider_Alert', 'alert_time, alert_hostname, alert_project, alert_spider')
path = r'D:\data\ceshi.txt'
g_path = r'D:\data\\'
file_name = r'result.txt'
file_path = g_path + file_name
alerts_list = list()
with open(path, encoding="utf-8") as file:
    lines = file.readlines()  # 读取每一行
    count = 0
    time = None
    hostname = None
    project = None
    for line in lines:
        if re.search(r'^\d{2}:\d{2}:\d{2}\s*$', line):
            time = re.search(r'^(\d{2}:\d{2}:\d{2})\s*$', line).group(1)
        if re.search(r'^hostname:\s*(.+)', line):
            hostname = re.search(r'^hostname:\s*(.+)', line).group(1)
        if re.search(r'project:\s*([^,]+),', line):
            project = re.search(r'project:\s*([^,]+),', line).group(1)
        if re.search(r'spider:\s*([^,]+),', line):
            spider = re.search(r'spider:\s*([^,]+),', line).group(1)
        if re.search(r'^error_data', line):
            spider_alert = None
            spider_alert = alert(alert_time=time, alert_hostname=hostname, alert_project=project, alert_spider=spider)
            alerts_list.append(spider_alert)
for element in alerts_list:
    print(element[0], element[1], element[3])
    with open(file_path, 'a', encoding="utf-8") as file:
        file.write(element[0] + "\t" + element[1] + "\t" + element[3])
        file.write(' \n')

执行结果如下图:

Python 匹配文本并在其上一行追加文本


Tags in this post...

Python 相关文章推荐
python读取json文件并将数据插入到mongodb的方法
Mar 23 Python
使用python和Django完成博客数据库的迁移方法
Jan 05 Python
理论讲解python多进程并发编程
Feb 09 Python
在python3中pyqt5和mayavi不兼容问题的解决方法
Jan 08 Python
python 图片去噪的方法示例
Jul 09 Python
详解python实现数据归一化处理的方式:(0,1)标准化
Jul 17 Python
python中字典按键或键值排序的实现代码
Aug 27 Python
python代码xml转txt实例
Mar 10 Python
浅谈python 中的 type(), dtype(), astype()的区别
Apr 09 Python
基于python实现数组格式参数加密计算
Apr 21 Python
opencv 形态学变换(开运算,闭运算,梯度运算)
Jul 07 Python
bat批处理之字符串操作的实现
Mar 16 Python
Python 一键获取电脑浏览器的账号密码
May 11 #Python
图神经网络GNN算法
May 11 #Python
python神经网络ResNet50模型
May 06 #Python
python和anaconda的区别
May 06 #Python
python神经网络Xception模型
May 06 #Python
Python使用永中文档转换服务
May 06 #Python
Python tensorflow卷积神经Inception V3网络结构
May 06 #Python
You might like
非常精妙的PHP递归调用与静态变量使用
2012/12/16 PHP
php对象和数组相互转换的方法
2015/05/12 PHP
php实现按天数、星期、月份查询的搜索框
2016/05/02 PHP
Zend Framework数据库操作技巧总结
2017/02/18 PHP
PHP开发实现微信退款功能示例
2017/11/25 PHP
ext 代码生成器
2009/08/07 Javascript
JS 自定义函数缺省值的设置方法
2010/05/05 Javascript
JavaScript判断一个URL链接是否有效的实现方法
2011/10/08 Javascript
javascript获取浏览器类型和版本的方法(js获取浏览器版本)
2014/03/13 Javascript
nodejs实现遍历文件夹并统计文件大小
2015/05/28 NodeJs
Javascript中的Prototype到底是什么
2016/02/16 Javascript
javascript实现移动端上的触屏拖拽功能
2016/03/04 Javascript
举例说明JavaScript中的实例对象与原型对象
2016/03/11 Javascript
简洁实用的BootStrap jQuery手风琴插件
2016/08/31 Javascript
深入理解Javascript中的valueOf与toString
2017/01/04 Javascript
基于Vue自定义指令实现按钮级权限控制思路详解
2018/05/23 Javascript
Vue Elenent实现表格相同数据列合并
2020/11/30 Vue.js
element-plus一个vue3.xUI框架(element-ui的3.x 版初体验)
2020/12/02 Vue.js
vue使用transition组件动画效果的实例代码
2021/01/28 Vue.js
[51:11]2014 DOTA2国际邀请赛中国区预选赛5.21 LGD-CDEC VS DT
2014/05/22 DOTA
Python中的descriptor描述器简明使用指南
2016/06/02 Python
Django forms组件的使用教程
2018/10/08 Python
手把手教你pycharm专业版安装破解教程(linux版)
2019/09/26 Python
python GUI库图形界面开发之PyQt5浏览器控件QWebEngineView详细使用方法
2020/02/26 Python
tensorflow图像裁剪进行数据增强操作
2020/06/30 Python
HTML5之SVG 2D入门9—蒙板及mask元素介绍与应用
2013/01/30 HTML / CSS
HTML5 3D书本翻页动画的实现示例
2019/08/28 HTML / CSS
Python中pass语句的作用是什么
2016/06/01 面试题
求职信模版
2013/11/30 职场文书
秘书专业自荐信范文
2013/12/26 职场文书
简历自我评价怎么写好呢?
2014/01/04 职场文书
银行领导证婚词
2014/01/11 职场文书
投标邀请书范文
2014/01/31 职场文书
婚纱店策划方案
2014/05/22 职场文书
2014年班组工作总结
2014/11/20 职场文书
2016年党建工作简报
2015/11/26 职场文书