Python中使用md5sum检查目录中相同文件代码分享


Posted in Python onFebruary 02, 2015
"""This module contains code from

Think Python by Allen B. Downey
http://thinkpython.com
Copyright 2012 Allen B. Downey

License: GNU GPLv3 http://www.gnu.org/licenses/gpl.html
"""
import os
def walk(dirname):

    """Finds the names of all files in dirname and its subdirectories.
    dirname: string name of directory

    """

    names = []

    for name in os.listdir(dirname):

        path = os.path.join(dirname, name)
        if os.path.isfile(path):

            names.append(path)

        else:

            names.extend(walk(path))

    return names


def compute_checksum(filename):

    """Computes the MD5 checksum of the contents of a file.
    filename: string

    """

    cmd = 'md5sum ' + filename

    return pipe(cmd)


def check_diff(name1, name2):

    """Computes the difference between the contents of two files.
    name1, name2: string filenames

    """

    cmd = 'diff %s %s' % (name1, name2)

    return pipe(cmd)


def pipe(cmd):

    """Runs a command in a subprocess.
    cmd: string Unix command
    Returns (res, stat), the output of the subprocess and the exit status.

    """

    fp = os.popen(cmd)

    res = fp.read()

    stat = fp.close()

    assert stat is None

    return res, stat


def compute_checksums(dirname, suffix):

    """Computes checksums for all files with the given suffix.
    dirname: string name of directory to search

    suffix: string suffix to match
    Returns: map from checksum to list of files with that checksum

    """

    names = walk(dirname)
    d = {}

    for name in names:

        if name.endswith(suffix):

            res, stat = compute_checksum(name)

            checksum, _ = res.split()
            if checksum in d:

                d[checksum].append(name)

            else:

                d[checksum] = [name]
    return d


def check_pairs(names):

    """Checks whether any in a list of files differs from the others.
    names: list of string filenames

    """

    for name1 in names:

        for name2 in names:

            if name1 < name2:

                res, stat = check_diff(name1, name2)

                if res:

                    return False

    return True


def print_duplicates(d):

    """Checks for duplicate files.
    Reports any files with the same checksum and checks whether they

    are, in fact, identical.
    d: map from checksum to list of files with that checksum

    """

    for key, names in d.iteritems():

        if len(names) > 1:

            print 'The following files have the same checksum:'

            for name in names:

                print name
            if check_pairs(names):

                print 'And they are identical.'


if __name__ == '__main__':

    d = compute_checksums(dirname='.', suffix='.py')

    print_duplicates(d)
Python 相关文章推荐
Python实现测试磁盘性能的方法
Mar 12 Python
举例讲解Django中数据模型访问外键值的方法
Jul 21 Python
用python写的一个wordpress的采集程序
Feb 27 Python
Python 模拟购物车的实例讲解
Sep 11 Python
python dataframe astype 字段类型转换方法
Apr 11 Python
Flask实现图片的上传、下载及展示示例代码
Aug 03 Python
浅谈Pycharm中的Python Console与Terminal
Jan 17 Python
Django关于admin的使用技巧和知识点
Feb 10 Python
python3 正则表达式基础廖雪峰
Mar 25 Python
利用python控制Autocad:pyautocad方式
Jun 01 Python
Django如何在不停机的情况下创建索引
Aug 02 Python
Python 程序员必须掌握的日志记录
Aug 17 Python
Python列表append和+的区别浅析
Feb 02 #Python
Python中的tuple元组详细介绍
Feb 02 #Python
Linux下编译安装MySQL-Python教程
Feb 02 #Python
Python写的服务监控程序实例
Jan 31 #Python
用python 制作图片转pdf工具
Jan 30 #Python
Python是编译运行的验证方法
Jan 30 #Python
Python的类实例属性访问规则探讨
Jan 30 #Python
You might like
PHP遍历二维数组的代码
2011/04/22 PHP
php实现字符串首字母大写和单词首字母大写的方法
2015/03/14 PHP
CodeIgniter针对lighttpd服务器URL重写的方法
2015/06/10 PHP
Yii2中hasOne、hasMany及多对多关联查询的用法详解
2017/02/15 PHP
Thinkphp 5.0实现微信企业付款到零钱
2018/09/30 PHP
PHP中的访问修饰符简单比较
2019/02/02 PHP
使用indexOf等在JavaScript的数组中进行元素查找和替换
2013/09/18 Javascript
js捕获鼠标滚轮事件代码
2013/12/16 Javascript
js记录点击某个按钮的次数-刷新次数为初始状态的实例
2017/02/15 Javascript
微信小程序 http请求的session管理
2017/06/07 Javascript
史上最全JavaScript数组去重的十种方法(推荐)
2017/08/17 Javascript
jQuery封装animate.css的实例
2018/01/04 jQuery
Vue.js实现开发购物车功能的方法详解
2019/02/22 Javascript
js常见遍历操作小结
2019/06/06 Javascript
在Vue中使用icon 字体图标的方法
2019/06/14 Javascript
javascript实现摄像头拍照预览
2019/09/30 Javascript
Vue.directive 实现元素scroll逻辑复用
2019/11/29 Javascript
js、jquery实现列表模糊搜索功能过程解析
2020/03/27 jQuery
Javascript柯里化实现原理及作用解析
2020/10/22 Javascript
JavaScript实现消消乐的源代码
2021/01/12 Javascript
[01:04]DOTA2:伟大的Roshan雕塑震撼来临
2015/01/30 DOTA
简单介绍Python中的RSS处理
2015/04/13 Python
python并发编程之多进程、多线程、异步和协程详解
2016/10/28 Python
Python反射用法实例简析
2017/12/22 Python
用Python生成HTML表格的方法示例
2020/03/06 Python
将pycharm配置为matlab或者spyder的用法说明
2020/06/08 Python
使用Html5 Stream开发实时监控系统
2020/06/02 HTML / CSS
行政前台岗位职责
2013/12/04 职场文书
解除合同协议书
2014/04/17 职场文书
2014年党支部学习材料
2014/05/19 职场文书
2014年卫生监督工作总结
2014/12/09 职场文书
上班迟到检讨书范文
2015/05/06 职场文书
2016年“5.12”国际护士节活动总结
2016/04/06 职场文书
Redis遍历所有key的两个命令(KEYS 和 SCAN)
2021/04/12 Redis
Python编程中Python与GIL互斥锁关系作用分析
2021/09/15 Python
Python使用永中文档转换服务
2022/05/06 Python