Python中使用md5sum检查目录中相同文件代码分享


Posted in Python onFebruary 02, 2015
"""This module contains code from

Think Python by Allen B. Downey
http://thinkpython.com
Copyright 2012 Allen B. Downey

License: GNU GPLv3 http://www.gnu.org/licenses/gpl.html
"""
import os
def walk(dirname):

    """Finds the names of all files in dirname and its subdirectories.
    dirname: string name of directory

    """

    names = []

    for name in os.listdir(dirname):

        path = os.path.join(dirname, name)
        if os.path.isfile(path):

            names.append(path)

        else:

            names.extend(walk(path))

    return names


def compute_checksum(filename):

    """Computes the MD5 checksum of the contents of a file.
    filename: string

    """

    cmd = 'md5sum ' + filename

    return pipe(cmd)


def check_diff(name1, name2):

    """Computes the difference between the contents of two files.
    name1, name2: string filenames

    """

    cmd = 'diff %s %s' % (name1, name2)

    return pipe(cmd)


def pipe(cmd):

    """Runs a command in a subprocess.
    cmd: string Unix command
    Returns (res, stat), the output of the subprocess and the exit status.

    """

    fp = os.popen(cmd)

    res = fp.read()

    stat = fp.close()

    assert stat is None

    return res, stat


def compute_checksums(dirname, suffix):

    """Computes checksums for all files with the given suffix.
    dirname: string name of directory to search

    suffix: string suffix to match
    Returns: map from checksum to list of files with that checksum

    """

    names = walk(dirname)
    d = {}

    for name in names:

        if name.endswith(suffix):

            res, stat = compute_checksum(name)

            checksum, _ = res.split()
            if checksum in d:

                d[checksum].append(name)

            else:

                d[checksum] = [name]
    return d


def check_pairs(names):

    """Checks whether any in a list of files differs from the others.
    names: list of string filenames

    """

    for name1 in names:

        for name2 in names:

            if name1 < name2:

                res, stat = check_diff(name1, name2)

                if res:

                    return False

    return True


def print_duplicates(d):

    """Checks for duplicate files.
    Reports any files with the same checksum and checks whether they

    are, in fact, identical.
    d: map from checksum to list of files with that checksum

    """

    for key, names in d.iteritems():

        if len(names) > 1:

            print 'The following files have the same checksum:'

            for name in names:

                print name
            if check_pairs(names):

                print 'And they are identical.'


if __name__ == '__main__':

    d = compute_checksums(dirname='.', suffix='.py')

    print_duplicates(d)
Python 相关文章推荐
Python的函数嵌套的使用方法
Jan 24 Python
Python编程实战之Oracle数据库操作示例
Jun 21 Python
python中将字典形式的数据循环插入Excel
Jan 16 Python
Django中的CBV和FBV示例介绍
Feb 25 Python
详解python中asyncio模块
Mar 03 Python
Django读取Mysql数据并显示在前端的实例
May 27 Python
Python爬虫实现(伪)球迷速成
Jun 10 Python
python实现对指定字符串补足固定长度倍数截断输出的方法
Nov 15 Python
Python基础进阶之海量表情包多线程爬虫功能的实现
Dec 17 Python
python爬虫利用代理池更换IP的方法步骤
Feb 21 Python
Python离线安装openpyxl模块的步骤
Mar 30 Python
OpenCV-Python实现轮廓拟合
Jun 08 Python
Python列表append和+的区别浅析
Feb 02 #Python
Python中的tuple元组详细介绍
Feb 02 #Python
Linux下编译安装MySQL-Python教程
Feb 02 #Python
Python写的服务监控程序实例
Jan 31 #Python
用python 制作图片转pdf工具
Jan 30 #Python
Python是编译运行的验证方法
Jan 30 #Python
Python的类实例属性访问规则探讨
Jan 30 #Python
You might like
如何在PHP中使用Oracle数据库(1)
2006/10/09 PHP
php压缩HTML函数轻松实现压缩html/js/Css及注意事项
2013/01/27 PHP
浅谈PHP中单引号和双引号到底有啥区别呢?
2015/03/04 PHP
PHP中调用C/C++制作的动态链接库的教程
2016/03/10 PHP
PHP中md5()函数的用法讲解
2019/03/30 PHP
laravel框架 laravel-admin上传图片到oss的方法
2019/10/13 PHP
十个优秀的Ajax/Javascript实例网站收集
2010/03/31 Javascript
原生Js实现元素渐隐/渐现(原理为修改元素的css透明度)
2013/06/24 Javascript
JavaScript中的getTime()方法使用详解
2015/06/10 Javascript
浅析JS运动
2015/12/28 Javascript
JavaScript类型系统之正则表达式
2016/01/05 Javascript
Node连接mysql数据库方法介绍
2017/02/07 Javascript
JS给按钮添加跳转功能类似a标签
2017/05/30 Javascript
解决Vue中引入swiper,在数据渲染的时候,发生不滑动的问题
2018/09/27 Javascript
解决vue elementUI 使用el-select 时 change事件的触发问题
2020/11/17 Vue.js
Python验证企业工商注册码
2015/10/25 Python
python网络编程调用recv函数完整接收数据的三种方法
2017/03/31 Python
python+selenium+autoit实现文件上传功能
2017/08/23 Python
python3利用tcp实现文件夹远程传输
2018/07/28 Python
解决python中画图时x,y轴名称出现中文乱码的问题
2019/01/29 Python
Python3+OpenCV2实现图像的几何变换(平移、镜像、缩放、旋转、仿射)
2019/05/13 Python
python中break、continue 、exit() 、pass终止循环的区别详解
2019/07/08 Python
scikit-learn线性回归,多元回归,多项式回归的实现
2019/08/29 Python
Django 实现Admin自动填充当前用户的示例代码
2019/11/18 Python
python爬取代理ip的示例
2020/12/18 Python
CSS3解析抖音LOGO制作的方法步骤
2019/04/11 HTML / CSS
HTML5重塑Web世界它将如何改变互联网
2012/12/17 HTML / CSS
Dyson戴森波兰官网:Dyson.pl
2019/08/05 全球购物
护士自我鉴定范文
2013/10/06 职场文书
优秀员工自荐书范文
2013/12/08 职场文书
2014年两会学习心得体会
2014/03/17 职场文书
学雷锋标语
2014/06/25 职场文书
上班迟到检讨书300字
2014/10/18 职场文书
2014年销售工作总结与计划
2014/12/01 职场文书
2015年乡镇环保工作总结
2015/04/22 职场文书
动画「半妖的夜叉姬」新BD特典图公开
2022/03/22 日漫