Python中使用md5sum检查目录中相同文件代码分享


Posted in Python onFebruary 02, 2015
"""This module contains code from

Think Python by Allen B. Downey
http://thinkpython.com
Copyright 2012 Allen B. Downey

License: GNU GPLv3 http://www.gnu.org/licenses/gpl.html
"""
import os
def walk(dirname):

    """Finds the names of all files in dirname and its subdirectories.
    dirname: string name of directory

    """

    names = []

    for name in os.listdir(dirname):

        path = os.path.join(dirname, name)
        if os.path.isfile(path):

            names.append(path)

        else:

            names.extend(walk(path))

    return names


def compute_checksum(filename):

    """Computes the MD5 checksum of the contents of a file.
    filename: string

    """

    cmd = 'md5sum ' + filename

    return pipe(cmd)


def check_diff(name1, name2):

    """Computes the difference between the contents of two files.
    name1, name2: string filenames

    """

    cmd = 'diff %s %s' % (name1, name2)

    return pipe(cmd)


def pipe(cmd):

    """Runs a command in a subprocess.
    cmd: string Unix command
    Returns (res, stat), the output of the subprocess and the exit status.

    """

    fp = os.popen(cmd)

    res = fp.read()

    stat = fp.close()

    assert stat is None

    return res, stat


def compute_checksums(dirname, suffix):

    """Computes checksums for all files with the given suffix.
    dirname: string name of directory to search

    suffix: string suffix to match
    Returns: map from checksum to list of files with that checksum

    """

    names = walk(dirname)
    d = {}

    for name in names:

        if name.endswith(suffix):

            res, stat = compute_checksum(name)

            checksum, _ = res.split()
            if checksum in d:

                d[checksum].append(name)

            else:

                d[checksum] = [name]
    return d


def check_pairs(names):

    """Checks whether any in a list of files differs from the others.
    names: list of string filenames

    """

    for name1 in names:

        for name2 in names:

            if name1 < name2:

                res, stat = check_diff(name1, name2)

                if res:

                    return False

    return True


def print_duplicates(d):

    """Checks for duplicate files.
    Reports any files with the same checksum and checks whether they

    are, in fact, identical.
    d: map from checksum to list of files with that checksum

    """

    for key, names in d.iteritems():

        if len(names) > 1:

            print 'The following files have the same checksum:'

            for name in names:

                print name
            if check_pairs(names):

                print 'And they are identical.'


if __name__ == '__main__':

    d = compute_checksums(dirname='.', suffix='.py')

    print_duplicates(d)
Python 相关文章推荐
python基于Tkinter库实现简单文本编辑器实例
May 05 Python
win系统下为Python3.5安装flask-mongoengine 库
Dec 20 Python
python reduce 函数使用详解
Dec 05 Python
使用python实现快速搭建简易的FTP服务器
Sep 12 Python
python去重,一个由dict组成的list的去重示例
Jan 21 Python
django框架基于queryset和双下划线的跨表查询操作详解
Dec 11 Python
python3.8与pyinstaller冲突问题的快速解决方法
Jan 16 Python
python模拟斗地主发牌
Apr 22 Python
Python网络爬虫四大选择器用法原理总结
Jun 01 Python
python反扒机制的5种解决方法
Feb 06 Python
python 批量压缩图片的脚本
Jun 02 Python
Python绘制散点图之可视化神器pyecharts
Jul 07 Python
Python列表append和+的区别浅析
Feb 02 #Python
Python中的tuple元组详细介绍
Feb 02 #Python
Linux下编译安装MySQL-Python教程
Feb 02 #Python
Python写的服务监控程序实例
Jan 31 #Python
用python 制作图片转pdf工具
Jan 30 #Python
Python是编译运行的验证方法
Jan 30 #Python
Python的类实例属性访问规则探讨
Jan 30 #Python
You might like
PHP动态变静态原理
2006/11/25 PHP
PHP 字符串加密函数(在指定时间内加密还原字符串,超时无法还原)
2010/04/28 PHP
php安装xdebug/php安装pear/phpunit详解步骤(图)
2013/12/22 PHP
php中array_slice和array_splice函数解析
2016/10/18 PHP
PHP中的多种加密技术及代码示例解析
2016/10/20 PHP
laravel5.6 框架邮件队列database驱动简单demo示例
2020/01/26 PHP
JavaScript使用cookie实现记住账号密码功能
2015/04/27 Javascript
ECMAScript5(ES5)中bind方法使用小结
2015/05/07 Javascript
JavaScript中Number.MAX_VALUE属性的使用方法
2015/06/04 Javascript
JavaScript ParseFloat()方法
2015/12/18 Javascript
正则表达式优化JSON字符串的技巧
2015/12/24 Javascript
jq给页面添加覆盖层遮罩的实例
2017/02/16 Javascript
微信小程序 ecshop地址三级联动实现实例代码
2017/02/28 Javascript
vue中,在本地缓存中读写数据的方法
2018/09/21 Javascript
vue.js this.$router.push获取不到params参数问题
2020/03/03 Javascript
Vue中computed及watch区别实例解析
2020/08/01 Javascript
py中的目录与文件判别代码
2008/07/16 Python
在Linux下调试Python代码的各种方法
2015/04/17 Python
Django的分页器实例(paginator)
2017/12/01 Python
浅谈python数据类型及类型转换
2017/12/18 Python
Python编程实现线性回归和批量梯度下降法代码实例
2018/01/04 Python
解决python报错MemoryError的问题
2018/06/26 Python
python3的输入方式及多组输入方法
2018/10/17 Python
python 3.6.7实现端口扫描器
2019/09/04 Python
Python读取excel文件中带公式的值的实现
2020/04/17 Python
python 追踪except信息方式
2020/04/25 Python
用python写一个带有gui界面的密码生成器
2020/11/06 Python
美国定制钻石订婚戒指:Ritani
2017/12/08 全球购物
10条PHP编程习惯
2014/05/26 面试题
Laravel中Kafka的使用详解
2021/03/24 PHP
业务员岗位职责范本
2013/12/15 职场文书
个人求职信范例
2014/01/29 职场文书
会计系毕业生求职信
2014/05/28 职场文书
学雷锋的心得体会
2014/09/04 职场文书
西安兵马俑导游词
2015/02/02 职场文书
python爬取企查查企业信息之selenium自动模拟登录企查查
2021/04/08 Python