Python 匹配文本并在其上一行追加文本


Posted in Python onMay 11, 2022

匹配文本并在其上一行追加文本

问题描述

Python匹配文本并在其上一行追加文本

test.txt

a
b
c
d
e

1.读进列表后覆盖原文件 

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
match_then_insert('test.txt', match='c', content='123')

效果

a
b
123
c
d
e

2.FileInput类

from fileinput import FileInput
def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    for line in FileInput(filename, inplace=True):  # 原地过滤
        if match in line:
            line = content + '\n' + line
        print(line, end='')  # 输出重定向到原文件
match_then_insert('test.txt', match='c', content='123')

3.seek

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()  # 逐行读取
            except IndexError:  # 超出范围则退出
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)  # 光标移动到上一行
                rest = f.read()  # 读取余下内容
                f.seek(-len(rest), 1)  # 光标移动回原位置
                f.truncate()  # 删除余下内容
                content = content + '\n'
                f.write(content.encode())  # 插入指定内容
                f.write(rest)  # 还原余下内容
                break
match_then_insert('test.txt', match='c', content='123')

对比

方案 耗时/s
读进列表后覆盖原文件 54.42
FileInput类 121.59
seek 3.53
from timeit import timeit
from fileinput import FileInput
def init_txt():
    open('test.txt', mode='w').write('\n'.join(['a', 'b', 'c', 'd', 'e']))
def f1(filename='test.txt', match='c', content='123'):
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
def f2(filename='test.txt', match='c', content='123'):
    for line in FileInput(filename, inplace=True):
        if match in line:
            line = content + '\n' + line
        print(line, end='')
def f3(filename='test.txt', match='c', content='123'):
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()
            except IndexError:
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)
                rest = f.read()
                f.seek(-len(rest), 1)
                f.truncate()
                content = content + '\n'
                f.write(content.encode())
                f.write(rest)
                break
init_txt()
print(timeit(f1, number=1000))
init_txt()
print(timeit(f2, number=1000))
init_txt()
print(timeit(f3, number=1000))

遇到的坑

报错可试试在文件头部添加

# -*- coding: utf-8 -*-

或指定 encoding='utf-8'

用正则表达式匹配文本(Python经典编程案例)

ceshi.txt文本如下:第一行为空行

爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-1
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ah_sina_com_cn,job: 28395818dbcb11e998a3f632d94e247c,pid: 88971,log: data/logs/chinabond_fast_spider/ah_sina_com_cn/28395818dbcb11e998a3f632d94e247c.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shupeidian_bjx_com_cn,job: 04738a5cdbcb11e9803172286b76aa73,pid: 34246,log: data/logs/chinabond_fast_spider/shupeidian_bjx_com_cn/04738a5cdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_sdchina_com,job: 28e8db4edbcb11e9803172286b76aa73,pid: 34324,log: data/logs/chinabond_fast_spider/news_sdchina_com/28e8db4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: hq_smm_cn,job: 4bdc3af6dbcb11e9a45522b8c8b2a9e4,pid: 111593,log: data/logs/chinabond_fast_spider/hq_smm_cn/4bdc3af6dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: sichuan_scol_com_cn,job: 71321c4edbcb11e9803172286b76aa73,pid: 34461,log: data/logs/chinabond_fast_spider/sichuan_scol_com_cn/71321c4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_mof_gov_cn,job: 7418dacedbcb11e9b15e02034af50b6e,pid: 65326,log: data/logs/chinabond_fast_spider/www_mof_gov_cn/7418dacedbcb11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-5
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_funxun_com,job: 4dcda7a0dbcb11e980a8862f09ca6d70,pid: 27785,log: data/logs/chinabond_fast_spider/www_funxun_com/4dcda7a0dbcb11e980a8862f09ca6d70.log,items: None
error_data:
爬虫任务报警
01:49:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shuidian_bjx_com_cn,job: 95090682dbcb11e9a0fade28e59e3773,pid: 106424,log: data/logs/chinabond_fast_spider/shuidian_bjx_com_cn/95090682dbcb11e9a0fade28e59e3773.log,items: None
error_data:
爬虫任务报警
01:51:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: tech_sina_com_cn,job: de4bdf72dbcb11e9a45522b8c8b2a9e4,pid: 111685,log: data/logs/chinabond_fast_spider/tech_sina_com_cn/de4bdf72dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ee_ofweek_com,job: ff6bd5b8dbcb11e9803172286b76aa73,pid: 34626,log: data/logs/chinabond_fast_spider/ee_ofweek_com/ff6bd5b8dbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: house_hexun_com,job: ff6dfdacdbcb11e9803172286b76aa73,pid: 34633,log: data/logs/chinabond_fast_spider/house_hexun_com/ff6dfdacdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_sjfzxm_com,job: 018e7d78dbcc11e9b15e02034af50b6e,pid: 65492,log: data/logs/chinabond_fast_spider/www_sjfzxm_com/018e7d78dbcc11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:53:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_xianzhaiwang_cn,job: 48d835e8dbcc11e9a0fade28e59e3773,pid: 106476,log: data/logs/chinabond_fast_spider/news_xianzhaiwang_cn/48d835e8dbcc11e9a0fade28e59e3773.log,items: None
error_data:

代码如下:

import os
import re
import json
from collections import namedtuple
alert = namedtuple('Spider_Alert', 'alert_time, alert_hostname, alert_project, alert_spider')
path = r'D:\data\ceshi.txt'
g_path = r'D:\data\\'
file_name = r'result.txt'
file_path = g_path + file_name
alerts_list = list()
with open(path, encoding="utf-8") as file:
    lines = file.readlines()  # 读取每一行
    count = 0
    time = None
    hostname = None
    project = None
    for line in lines:
        if re.search(r'^\d{2}:\d{2}:\d{2}\s*$', line):
            time = re.search(r'^(\d{2}:\d{2}:\d{2})\s*$', line).group(1)
        if re.search(r'^hostname:\s*(.+)', line):
            hostname = re.search(r'^hostname:\s*(.+)', line).group(1)
        if re.search(r'project:\s*([^,]+),', line):
            project = re.search(r'project:\s*([^,]+),', line).group(1)
        if re.search(r'spider:\s*([^,]+),', line):
            spider = re.search(r'spider:\s*([^,]+),', line).group(1)
        if re.search(r'^error_data', line):
            spider_alert = None
            spider_alert = alert(alert_time=time, alert_hostname=hostname, alert_project=project, alert_spider=spider)
            alerts_list.append(spider_alert)
for element in alerts_list:
    print(element[0], element[1], element[3])
    with open(file_path, 'a', encoding="utf-8") as file:
        file.write(element[0] + "\t" + element[1] + "\t" + element[3])
        file.write(' \n')

执行结果如下图:

Python 匹配文本并在其上一行追加文本


Tags in this post...

Python 相关文章推荐
Python单元测试框架unittest简明使用实例
Apr 13 Python
用virtualenv建立多个Python独立虚拟开发环境
Jul 06 Python
Python cookbook(数据结构与算法)让字典保持有序的方法
Feb 18 Python
python使用__slots__让你的代码更加节省内存
Sep 05 Python
Python实现Linux监控的方法
May 16 Python
Flask框架模板继承实现方法分析
Jul 31 Python
python配置文件写入过程详解
Oct 19 Python
Python和Anaconda和Pycharm安装教程图文详解
Feb 04 Python
python批量修改xml属性的实现方式
Mar 05 Python
解决django 向mysql中写入中文字符出错的问题
May 18 Python
python基于socket函数实现端口扫描
May 28 Python
Python获取excel内容及相关操作代码实例
Aug 10 Python
Python 一键获取电脑浏览器的账号密码
May 11 #Python
图神经网络GNN算法
May 11 #Python
python神经网络ResNet50模型
May 06 #Python
python和anaconda的区别
May 06 #Python
python神经网络Xception模型
May 06 #Python
Python使用永中文档转换服务
May 06 #Python
Python tensorflow卷积神经Inception V3网络结构
May 06 #Python
You might like
第三节 定义一个类 [3]
2006/10/09 PHP
COM in PHP (winows only)
2006/10/09 PHP
在服务端进行目录建立、删除,文件上传、删除的过程的php代码
2008/09/10 PHP
PHP实现仿百度文库,豆丁在线文档效果(word,excel,ppt转flash)
2016/03/10 PHP
不用ajax实现点击文字即可编辑的方法
2007/12/16 Javascript
js 表单验证方法(实用)
2009/04/28 Javascript
JS 密码强度验证(兼容IE,火狐,谷歌)
2010/03/15 Javascript
修改好的jquery滚动字幕效果实现代码
2011/06/22 Javascript
ExtJS自定义主题(theme)样式详解
2013/11/18 Javascript
JavaScript实现的链表数据结构实例
2015/04/02 Javascript
jQuery拖动元素并对元素进行重新排序
2015/12/30 Javascript
vue.js绑定class和style样式(6)
2016/12/09 Javascript
bootstrap laydate日期组件使用详解
2017/01/04 Javascript
js 数字、字符串、布尔值的转换方法(必看)
2017/04/07 Javascript
jQuery制作全屏宽度固定高度轮播图(实例讲解)
2017/07/08 jQuery
浅谈sass在vue注意的地方
2017/08/10 Javascript
layui之select的option叠加问题的解决方法
2018/03/08 Javascript
详解vue项目中如何引入全局sass/less变量、function、mixin
2018/06/02 Javascript
React Native 混合开发多入口加载方式详解
2019/09/23 Javascript
微信小程序背景音乐开发详解
2019/12/12 Javascript
react 原生实现头像滚动播放的示例
2020/04/21 Javascript
机器学习实战之knn算法pandas
2019/06/22 Python
python读取Excel表格文件的方法
2019/09/02 Python
python爬取Ajax动态加载网页过程解析
2019/09/05 Python
浅谈Python访问MySQL的正确姿势
2020/01/07 Python
推荐10个HTML5响应式框架
2016/02/25 HTML / CSS
工地质量标语
2014/06/12 职场文书
护林防火标语
2014/06/27 职场文书
秋季运动会广播稿(30篇)
2014/09/13 职场文书
个人三严三实对照检查材料
2014/09/25 职场文书
2015年教师党员承诺书
2015/04/27 职场文书
2015年初中教务处工作总结
2015/07/21 职场文书
高中政治教师教学反思
2016/02/23 职场文书
纪念建国70周年演讲稿
2019/07/19 职场文书
小学秋季运动会加油口号及加油稿
2019/08/19 职场文书
导游词之京东大峡谷旅游区
2019/10/29 职场文书