Python 匹配文本并在其上一行追加文本


Posted in Python onMay 11, 2022

匹配文本并在其上一行追加文本

问题描述

Python匹配文本并在其上一行追加文本

test.txt

a
b
c
d
e

1.读进列表后覆盖原文件 

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
match_then_insert('test.txt', match='c', content='123')

效果

a
b
123
c
d
e

2.FileInput类

from fileinput import FileInput
def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    for line in FileInput(filename, inplace=True):  # 原地过滤
        if match in line:
            line = content + '\n' + line
        print(line, end='')  # 输出重定向到原文件
match_then_insert('test.txt', match='c', content='123')

3.seek

def match_then_insert(filename, match, content):
    """匹配后在该行追加
    :param filename: 要操作的文件
    :param match: 匹配内容
    :param content: 追加内容
    """
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()  # 逐行读取
            except IndexError:  # 超出范围则退出
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)  # 光标移动到上一行
                rest = f.read()  # 读取余下内容
                f.seek(-len(rest), 1)  # 光标移动回原位置
                f.truncate()  # 删除余下内容
                content = content + '\n'
                f.write(content.encode())  # 插入指定内容
                f.write(rest)  # 还原余下内容
                break
match_then_insert('test.txt', match='c', content='123')

对比

方案 耗时/s
读进列表后覆盖原文件 54.42
FileInput类 121.59
seek 3.53
from timeit import timeit
from fileinput import FileInput
def init_txt():
    open('test.txt', mode='w').write('\n'.join(['a', 'b', 'c', 'd', 'e']))
def f1(filename='test.txt', match='c', content='123'):
    lines = open(filename).read().splitlines()
    index = lines.index(match)
    lines.insert(index, content)
    open(filename, mode='w').write('\n'.join(lines))
def f2(filename='test.txt', match='c', content='123'):
    for line in FileInput(filename, inplace=True):
        if match in line:
            line = content + '\n' + line
        print(line, end='')
def f3(filename='test.txt', match='c', content='123'):
    with open(filename, mode='rb+') as f:
        while True:
            try:
                line = f.readline()
            except IndexError:
                break
            line_str = line.decode().splitlines()[0]
            if line_str == match:
                f.seek(-len(line), 1)
                rest = f.read()
                f.seek(-len(rest), 1)
                f.truncate()
                content = content + '\n'
                f.write(content.encode())
                f.write(rest)
                break
init_txt()
print(timeit(f1, number=1000))
init_txt()
print(timeit(f2, number=1000))
init_txt()
print(timeit(f3, number=1000))

遇到的坑

报错可试试在文件头部添加

# -*- coding: utf-8 -*-

或指定 encoding='utf-8'

用正则表达式匹配文本(Python经典编程案例)

ceshi.txt文本如下:第一行为空行

爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-1
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ah_sina_com_cn,job: 28395818dbcb11e998a3f632d94e247c,pid: 88971,log: data/logs/chinabond_fast_spider/ah_sina_com_cn/28395818dbcb11e998a3f632d94e247c.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shupeidian_bjx_com_cn,job: 04738a5cdbcb11e9803172286b76aa73,pid: 34246,log: data/logs/chinabond_fast_spider/shupeidian_bjx_com_cn/04738a5cdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:45:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_sdchina_com,job: 28e8db4edbcb11e9803172286b76aa73,pid: 34324,log: data/logs/chinabond_fast_spider/news_sdchina_com/28e8db4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: hq_smm_cn,job: 4bdc3af6dbcb11e9a45522b8c8b2a9e4,pid: 111593,log: data/logs/chinabond_fast_spider/hq_smm_cn/4bdc3af6dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: sichuan_scol_com_cn,job: 71321c4edbcb11e9803172286b76aa73,pid: 34461,log: data/logs/chinabond_fast_spider/sichuan_scol_com_cn/71321c4edbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_mof_gov_cn,job: 7418dacedbcb11e9b15e02034af50b6e,pid: 65326,log: data/logs/chinabond_fast_spider/www_mof_gov_cn/7418dacedbcb11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:47:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-5
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_funxun_com,job: 4dcda7a0dbcb11e980a8862f09ca6d70,pid: 27785,log: data/logs/chinabond_fast_spider/www_funxun_com/4dcda7a0dbcb11e980a8862f09ca6d70.log,items: None
error_data:
爬虫任务报警
01:49:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: shuidian_bjx_com_cn,job: 95090682dbcb11e9a0fade28e59e3773,pid: 106424,log: data/logs/chinabond_fast_spider/shuidian_bjx_com_cn/95090682dbcb11e9a0fade28e59e3773.log,items: None
error_data:
爬虫任务报警
01:51:20
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-0
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: tech_sina_com_cn,job: de4bdf72dbcb11e9a45522b8c8b2a9e4,pid: 111685,log: data/logs/chinabond_fast_spider/tech_sina_com_cn/de4bdf72dbcb11e9a45522b8c8b2a9e4.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: ee_ofweek_com,job: ff6bd5b8dbcb11e9803172286b76aa73,pid: 34626,log: data/logs/chinabond_fast_spider/ee_ofweek_com/ff6bd5b8dbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-6
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: house_hexun_com,job: ff6dfdacdbcb11e9803172286b76aa73,pid: 34633,log: data/logs/chinabond_fast_spider/house_hexun_com/ff6dfdacdbcb11e9803172286b76aa73.log,items: None
error_data:
爬虫任务报警
01:51:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-2
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: www_sjfzxm_com,job: 018e7d78dbcc11e9b15e02034af50b6e,pid: 65492,log: data/logs/chinabond_fast_spider/www_sjfzxm_com/018e7d78dbcc11e9b15e02034af50b6e.log,items: None
error_data:
爬虫任务报警
01:53:21
scrapyd==》爬虫任务异常死亡报警
hostname: scrapyd-chinabond-4
error_count: Process died: exitstatus=None ,project: chinabond_fast_spider,spider: news_xianzhaiwang_cn,job: 48d835e8dbcc11e9a0fade28e59e3773,pid: 106476,log: data/logs/chinabond_fast_spider/news_xianzhaiwang_cn/48d835e8dbcc11e9a0fade28e59e3773.log,items: None
error_data:

代码如下:

import os
import re
import json
from collections import namedtuple
alert = namedtuple('Spider_Alert', 'alert_time, alert_hostname, alert_project, alert_spider')
path = r'D:\data\ceshi.txt'
g_path = r'D:\data\\'
file_name = r'result.txt'
file_path = g_path + file_name
alerts_list = list()
with open(path, encoding="utf-8") as file:
    lines = file.readlines()  # 读取每一行
    count = 0
    time = None
    hostname = None
    project = None
    for line in lines:
        if re.search(r'^\d{2}:\d{2}:\d{2}\s*$', line):
            time = re.search(r'^(\d{2}:\d{2}:\d{2})\s*$', line).group(1)
        if re.search(r'^hostname:\s*(.+)', line):
            hostname = re.search(r'^hostname:\s*(.+)', line).group(1)
        if re.search(r'project:\s*([^,]+),', line):
            project = re.search(r'project:\s*([^,]+),', line).group(1)
        if re.search(r'spider:\s*([^,]+),', line):
            spider = re.search(r'spider:\s*([^,]+),', line).group(1)
        if re.search(r'^error_data', line):
            spider_alert = None
            spider_alert = alert(alert_time=time, alert_hostname=hostname, alert_project=project, alert_spider=spider)
            alerts_list.append(spider_alert)
for element in alerts_list:
    print(element[0], element[1], element[3])
    with open(file_path, 'a', encoding="utf-8") as file:
        file.write(element[0] + "\t" + element[1] + "\t" + element[3])
        file.write(' \n')

执行结果如下图:

Python 匹配文本并在其上一行追加文本


Tags in this post...

Python 相关文章推荐
Python中请使用isinstance()判断变量类型
Aug 25 Python
Python读写ini文件的方法
May 28 Python
Python中函数及默认参数的定义与调用操作实例分析
Jul 25 Python
Python实现针对含中文字符串的截取功能示例
Sep 22 Python
python找出完数的方法
Nov 12 Python
Python小工具之消耗系统指定大小内存的方法
Dec 03 Python
对python中Librosa的mfcc步骤详解
Jan 09 Python
python的依赖管理的实现
May 14 Python
python实现在内存中读写str和二进制数据代码
Apr 24 Python
python如何求圆的面积
Jul 01 Python
Python判断字符串是否为合法标示符操作
Sep 03 Python
python压包的概念及实例详解
Feb 17 Python
Python 一键获取电脑浏览器的账号密码
May 11 #Python
图神经网络GNN算法
May 11 #Python
python神经网络ResNet50模型
May 06 #Python
python和anaconda的区别
May 06 #Python
python神经网络Xception模型
May 06 #Python
Python使用永中文档转换服务
May 06 #Python
Python tensorflow卷积神经Inception V3网络结构
May 06 #Python
You might like
PHP 和 XML: 使用expat函数(一)
2006/10/09 PHP
PHP的面试题集,附我的答案和分析(一)
2006/11/19 PHP
php常用hash加密函数
2014/11/22 PHP
PHP 设计模式系列之 specification规格模式
2016/01/10 PHP
php 使用curl模拟ip和来源进行访问的实现方法
2017/05/02 PHP
PHP获取数组中单列值的方法
2017/06/10 PHP
Laravel中encrypt和decrypt的实现方法
2017/09/24 PHP
Laravel框架路由管理简单示例
2019/05/07 PHP
jQuery 各种浏览器下获得日期区别
2008/12/22 Javascript
setTimeout与setInterval在不同浏览器下的差异
2010/01/24 Javascript
JQuery之focus函数使用介绍
2013/08/20 Javascript
可兼容IE的获取及设置cookie的jquery.cookie函数方法
2013/09/02 Javascript
JS实现网页标题随机显示名人名言的方法
2015/11/03 Javascript
HTML5游戏引擎LTweenLite实现的超帅动画效果(附demo源码下载)
2016/01/26 Javascript
简单实现bootstrap导航效果
2017/02/07 Javascript
深入理解JavaScript继承的多种方式和优缺点
2017/05/12 Javascript
Vuex的基本概念、项目搭建以及入坑点
2018/11/04 Javascript
angular4笔记系列之内置指令小结
2018/11/09 Javascript
详解jQuery设置内容和属性
2019/04/11 jQuery
简单讲解Python中的闭包
2015/08/11 Python
python3中dict(字典)的使用方法示例
2017/03/22 Python
Python实现读取并保存文件的类
2017/05/11 Python
微信跳一跳自动运行python脚本
2018/01/08 Python
代码讲解Python对Windows服务进行监控
2018/02/11 Python
基于python实现KNN分类算法
2020/04/23 Python
python os.path.isfile 的使用误区详解
2019/11/29 Python
设置jupyter中DataFrame的显示限制方式
2020/04/12 Python
python re的findall和finditer的区别详解
2020/11/15 Python
CSS3等相关属性制作分页导航实现代码
2012/12/24 HTML / CSS
捷克电器和DJ设备网上商店:Electronic-star
2017/07/18 全球购物
纽约海:Sea New York
2018/11/04 全球购物
学生个人的自我评价分享
2013/11/05 职场文书
物流仓管员工作职责
2014/01/06 职场文书
小学生关于梦想的演讲稿
2014/08/22 职场文书
病人写给医生的感谢信
2015/01/23 职场文书
个人向公司借款协议书
2016/03/19 职场文书