Python 匹配文本并在其上一行追加文本


Posted in Python onMay 11, 2022

匹配文本并在其上一行追加文本

问题描述

Python匹配文本并在其上一行追加文本

test.txt

a
b
c
d
e

1.读进列表后覆盖原文件 

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
match_then_insert('test.txt', match='c', content='123')

效果

a
b
123
c
d
e

2.FileInput类

from fileinput import FileInput
def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    for line in FileInput(filename, inplace=True):  # 原地过滤
        if match in line:
            line = content + '\n' + line
        print(line, end='')  # 输出重定向到原文件
match_then_insert('test.txt', match='c', content='123')

3.seek

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()  # 逐行读取
            except IndexError:  # 超出范围则退出
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)  # 光标移动到上一行
                rest = f.read()  # 读取余下内容
                f.seek(-len(rest), 1)  # 光标移动回原位置
                f.truncate()  # 删除余下内容
                content = content + '\n'
                f.write(content.encode())  # 插入指定内容
                f.write(rest)  # 还原余下内容
                break
match_then_insert('test.txt', match='c', content='123')

对比

方案 耗时/s
读进列表后覆盖原文件 54.42
FileInput类 121.59
seek 3.53
from timeit import timeit
from fileinput import FileInput
def init_txt():
    open('test.txt', mode='w').write('\n'.join(['a', 'b', 'c', 'd', 'e']))
def f1(filename='test.txt', match='c', content='123'):
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
def f2(filename='test.txt', match='c', content='123'):
    for line in FileInput(filename, inplace=True):
        if match in line:
            line = content + '\n' + line
        print(line, end='')
def f3(filename='test.txt', match='c', content='123'):
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()
            except IndexError:
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)
                rest = f.read()
                f.seek(-len(rest), 1)
                f.truncate()
                content = content + '\n'
                f.write(content.encode())
                f.write(rest)
                break
init_txt()
print(timeit(f1, number=1000))
init_txt()
print(timeit(f2, number=1000))
init_txt()
print(timeit(f3, number=1000))

遇到的坑

报错可试试在文件头部添加

# -*- coding: utf-8 -*-

或指定 encoding='utf-8'

用正则表达式匹配文本(Python经典编程案例)

ceshi.txt文本如下:第一行为空行

爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-1
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ah_sina_com_cn,job: 28395818dbcb11e998a3f632d94e247c,pid: 88971,log: data/logs/chinabond_fast_spider/ah_sina_com_cn/28395818dbcb11e998a3f632d94e247c.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shupeidian_bjx_com_cn,job: 04738a5cdbcb11e9803172286b76aa73,pid: 34246,log: data/logs/chinabond_fast_spider/shupeidian_bjx_com_cn/04738a5cdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_sdchina_com,job: 28e8db4edbcb11e9803172286b76aa73,pid: 34324,log: data/logs/chinabond_fast_spider/news_sdchina_com/28e8db4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: hq_smm_cn,job: 4bdc3af6dbcb11e9a45522b8c8b2a9e4,pid: 111593,log: data/logs/chinabond_fast_spider/hq_smm_cn/4bdc3af6dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: sichuan_scol_com_cn,job: 71321c4edbcb11e9803172286b76aa73,pid: 34461,log: data/logs/chinabond_fast_spider/sichuan_scol_com_cn/71321c4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_mof_gov_cn,job: 7418dacedbcb11e9b15e02034af50b6e,pid: 65326,log: data/logs/chinabond_fast_spider/www_mof_gov_cn/7418dacedbcb11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-5
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_funxun_com,job: 4dcda7a0dbcb11e980a8862f09ca6d70,pid: 27785,log: data/logs/chinabond_fast_spider/www_funxun_com/4dcda7a0dbcb11e980a8862f09ca6d70.log,items: None
error_data:
爬虫任务报警
01:49:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shuidian_bjx_com_cn,job: 95090682dbcb11e9a0fade28e59e3773,pid: 106424,log: data/logs/chinabond_fast_spider/shuidian_bjx_com_cn/95090682dbcb11e9a0fade28e59e3773.log,items: None
error_data:
爬虫任务报警
01:51:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: tech_sina_com_cn,job: de4bdf72dbcb11e9a45522b8c8b2a9e4,pid: 111685,log: data/logs/chinabond_fast_spider/tech_sina_com_cn/de4bdf72dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ee_ofweek_com,job: ff6bd5b8dbcb11e9803172286b76aa73,pid: 34626,log: data/logs/chinabond_fast_spider/ee_ofweek_com/ff6bd5b8dbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: house_hexun_com,job: ff6dfdacdbcb11e9803172286b76aa73,pid: 34633,log: data/logs/chinabond_fast_spider/house_hexun_com/ff6dfdacdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_sjfzxm_com,job: 018e7d78dbcc11e9b15e02034af50b6e,pid: 65492,log: data/logs/chinabond_fast_spider/www_sjfzxm_com/018e7d78dbcc11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:53:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_xianzhaiwang_cn,job: 48d835e8dbcc11e9a0fade28e59e3773,pid: 106476,log: data/logs/chinabond_fast_spider/news_xianzhaiwang_cn/48d835e8dbcc11e9a0fade28e59e3773.log,items: None
error_data:

代码如下:

import os
import re
import json
from collections import namedtuple
alert = namedtuple('Spider_Alert', 'alert_time, alert_hostname, alert_project, alert_spider')
path = r'D:\data\ceshi.txt'
g_path = r'D:\data\\'
file_name = r'result.txt'
file_path = g_path + file_name
alerts_list = list()
with open(path, encoding="utf-8") as file:
    lines = file.readlines()  # 读取每一行
    count = 0
    time = None
    hostname = None
    project = None
    for line in lines:
        if re.search(r'^\d{2}:\d{2}:\d{2}\s*$', line):
            time = re.search(r'^(\d{2}:\d{2}:\d{2})\s*$', line).group(1)
        if re.search(r'^hostname:\s*(.+)', line):
            hostname = re.search(r'^hostname:\s*(.+)', line).group(1)
        if re.search(r'project:\s*([^,]+),', line):
            project = re.search(r'project:\s*([^,]+),', line).group(1)
        if re.search(r'spider:\s*([^,]+),', line):
            spider = re.search(r'spider:\s*([^,]+),', line).group(1)
        if re.search(r'^error_data', line):
            spider_alert = None
            spider_alert = alert(alert_time=time, alert_hostname=hostname, alert_project=project, alert_spider=spider)
            alerts_list.append(spider_alert)
for element in alerts_list:
    print(element[0], element[1], element[3])
    with open(file_path, 'a', encoding="utf-8") as file:
        file.write(element[0] + "\t" + element[1] + "\t" + element[3])
        file.write(' \n')

执行结果如下图:

Python 匹配文本并在其上一行追加文本


Tags in this post...

Python 相关文章推荐
Python3基础之基本数据类型概述
Aug 13 Python
Python爬虫框架Scrapy常用命令总结
Jul 26 Python
使用Python的toolz库开始函数式编程的方法
Nov 15 Python
Python实现的列表排序、反转操作示例
Mar 13 Python
一篇文章弄懂Python中所有数组数据类型
Jun 23 Python
pybind11在Windows下的使用教程
Jul 04 Python
Pycharm 2020最新永久激活码(附最新激活码和插件)
Sep 17 Python
通过Python实现一个简单的html页面
May 16 Python
在echarts中图例legend和坐标系grid实现左右布局实例
May 16 Python
如何基于Django实现上下文章跳转
Sep 16 Python
Python字符串三种格式化输出
Sep 17 Python
python实现简单贪吃蛇游戏
Sep 29 Python
Python 一键获取电脑浏览器的账号密码
May 11 #Python
图神经网络GNN算法
May 11 #Python
python神经网络ResNet50模型
May 06 #Python
python和anaconda的区别
May 06 #Python
python神经网络Xception模型
May 06 #Python
Python使用永中文档转换服务
May 06 #Python
Python tensorflow卷积神经Inception V3网络结构
May 06 #Python
You might like
Syphon 虹吸式咖啡壶冲煮–拨动法
2021/03/03 冲泡冲煮
php设计模式 Builder(建造者模式)
2011/06/26 PHP
php中adodbzip类实例
2014/12/08 PHP
phpStudy配置多站点多域名方法及遇到的403错误解决方法
2017/10/19 PHP
PHP命令Command模式用法实例分析
2018/08/08 PHP
jquery中对于批量deferred的处理方法
2014/01/22 Javascript
NodeJS创建基础应用并应用模板引擎
2016/04/12 NodeJs
浅析JavaScript动画模拟拖拽原理
2016/12/09 Javascript
React Router基础使用
2017/01/17 Javascript
jsTree事件和交互以及插件plugins详解
2017/08/29 Javascript
Vue 2.0学习笔记之Vue中的computed属性
2017/10/16 Javascript
layui的table中显示图片方法
2018/08/17 Javascript
json前后端数据交互相关代码
2018/09/19 Javascript
ES6中Set和Map数据结构,Map与其它数据结构互相转换操作实例详解
2019/02/28 Javascript
vue视图不更新情况详解
2019/05/16 Javascript
Vue中对iframe实现keep alive无刷新的方法
2019/07/23 Javascript
vue keep-alive 动态删除组件缓存的例子
2019/11/04 Javascript
vue实现可移动的悬浮按钮
2021/03/04 Vue.js
[05:31]干嘛呢兄弟!DOTA2 TI9语音轮盘部分出处
2019/05/14 DOTA
[01:01:23]完美世界DOTA2联赛PWL S2 Forest vs FTD.C 第一场 11.26
2020/11/30 DOTA
[01:07:34]DOTA2-DPC中国联赛定级赛 RNG vs Aster BO3第二场 1月9日
2021/03/11 DOTA
wtfPython—Python中一组有趣微妙的代码【收藏】
2018/08/31 Python
Python中__slots__属性介绍与基本使用方法
2018/09/05 Python
python被修饰的函数消失问题解决(基于wraps函数)
2019/11/04 Python
python输出国际象棋棋盘的实例分享
2020/11/26 Python
秦兵马俑教学反思
2014/02/07 职场文书
美术教师岗位职责
2014/03/18 职场文书
租房协议书
2014/09/12 职场文书
统计员岗位职责
2015/02/11 职场文书
2015高三毕业寄语赠言
2015/02/27 职场文书
清明节文明祭祀倡议书
2015/04/28 职场文书
2015年党总支工作总结
2015/05/25 职场文书
现实表现证明材料
2015/06/19 职场文书
go类型转换及与C的类型转换方式
2021/05/05 Golang
Win2008系统搭建DHCP服务器
2022/06/25 Servers
Java实现字符串转为驼峰格式的方法详解
2022/07/07 Java/Android