编程 Python

python查找指定具有相同内容文件的方法

Posted in Python onJune 28, 2015

本文实例讲述了python查找指定具有相同内容文件的方法。分享给大家供大家参考。具体如下：

python代码用于查找指定具有相同内容的文件，可以同时指定多个目录
调用方式：python doublesdetector.py c:\;d:\;e:\ > doubles.txt

# Hello, this script is written in Python - http://www.python.org
# doublesdetector.py 1.0p
import os, os.path, string, sys, sha
message = """
doublesdetector.py 1.0p
This script will search for files that are identical
(whatever their name/date/time).
 Syntax : python %s <directories>
   where <directories> is a directory or a list of directories
   separated by a semicolon (;)
Examples : python %s c:\windows
      python %s c:\;d:\;e:\ > doubles.txt
      python %s c:\program files > doubles.txt
This script is public domain. Feel free to reuse and tweak it.
The author of this script Sebastien SAUVAGE <sebsauvage at sebsauvage dot net>
http://sebsauvage.net/python/
""" % ((sys.argv[0], )*4)
def fileSHA ( filepath ) :
  """ Compute SHA (Secure Hash Algorythm) of a file.
    Input : filepath : full path and name of file (eg. 'c:\windows\emm386.exe')
    Output : string : contains the hexadecimal representation of the SHA of the file.
             returns '0' if file could not be read (file not found, no read rights...)
  """
  try:
    file = open(filepath,'rb')
    digest = sha.new()
    data = file.read(65536)
    while len(data) != 0:
      digest.update(data)
      data = file.read(65536)
    file.close()
  except:
    return '0'
  else:
    return digest.hexdigest()
def detectDoubles( directories ):
  fileslist = {}
  # Group all files by size (in the fileslist dictionnary)
  for directory in directories.split(';'):
    directory = os.path.abspath(directory)
    sys.stderr.write('Scanning directory '+directory+'...')
    os.path.walk(directory,callback,fileslist)
    sys.stderr.write('\n')
  sys.stderr.write('Comparing files...')
  # Remove keys (filesize) in the dictionnary which have only 1 file
  for (filesize,listoffiles) in fileslist.items():
    if len(listoffiles) == 1:
      del fileslist[filesize]
  # Now compute SHA of files that have the same size,
  # and group files by SHA (in the filessha dictionnary)
  filessha = {}
  while len(fileslist)>0:
    (filesize,listoffiles) = fileslist.popitem()
    for filepath in listoffiles:
      sys.stderr.write('.')
      sha = fileSHA(filepath)
      if filessha.has_key(sha):
        filessha[sha].append(filepath)
      else:
        filessha[sha] = [filepath]
  if filessha.has_key('0'):
    del filessha['0']
  # Remove keys (sha) in the dictionnary which have only 1 file
  for (sha,listoffiles) in filessha.items():
    if len(listoffiles) == 1:
      del filessha[sha]
  sys.stderr.write('\n')
  return filessha
def callback(fileslist,directory,files):
  sys.stderr.write('.')
  for fileName in files:
    filepath = os.path.join(directory,fileName)
    if os.path.isfile(filepath):
      filesize = os.stat(filepath)[6]
      if fileslist.has_key(filesize):
        fileslist[filesize].append(filepath)
      else:
        fileslist[filesize] = [filepath]
if len(sys.argv)>1 :
  doubles = detectDoubles(" ".join(sys.argv[1:]))
  print 'The following files are identical:'
  print '\n'.join(["----\n%s" % '\n'.join(doubles[filesha]) for filesha in doubles.keys()])
  print '----'
else:
  print message

希望本文所述对大家的Python程序设计有所帮助。

python查找指定具有相同内容文件的方法

- Author -

秋风秋雨

声明：登载此文出于传递更多信息之目的，并不意味着赞同其观点或证实其描述。

Python 相关文章推荐

一个小示例告诉你Python语言的优雅之处

Jul 04 Python

Python最长公共子串算法实例

Mar 07 Python

深入解析Python中的lambda表达式的用法

Aug 28 Python

Python使用openpyxl读写excel文件的方法

Jun 30 Python

Python网络爬虫与信息提取(实例讲解)

Aug 29 Python

Python中列表list以及list与数组array的相互转换实现方法

Sep 22 Python

python在文本开头插入一行的实例

May 02 Python

python pygame实现五子棋小游戏

Oct 26 Python

django2笔记之路由path语法的实现

Jul 17 Python

Django CBV类的用法详解

Jul 26 Python

python中return不返回值的问题解析

Jul 22 Python

给numpy.array增加维度的超简单方法

Jun 02 Python

python中getaddrinfo()基本用法实例分析

Jun 28 #Python

python实现搜索指定目录下文件及文件内搜索指定关键词的方法

Jun 28 #Python

分析用Python脚本关闭文件操作的机制

Jun 28 #Python

python实现linux下使用xcopy的方法

Jun 28 #Python

自动化Nginx服务器的反向代理的配置方法

Jun 28 #Python

python读取TXT到数组及列表去重后按原来顺序排序的方法

Jun 26 #Python

在Python中使用zlib模块进行数据压缩的教程

Jun 26 #Python

You might like

文件上传的实现

2006/10/09 PHP

PHP 内存缓存加速功能memcached安装与用法

2009/09/03 PHP

php根据一个给定范围和步进生成数组的方法

2015/06/19 PHP

PHP实现的杨辉三角求解算法分析

2019/03/11 PHP

静态图片的十一种滤镜效果--不支持Ie7及非IE浏览器。

2007/03/06 Javascript

JSON无限折叠菜单编写实例

2013/12/16 Javascript

JS实现往下不断流动网页背景的方法

2015/02/27 Javascript

jQuery实现仿腾讯微博滑出效果报告每日天气的方法

2015/05/11 Javascript

简述JavaScript对传统文档对象模型的支持

2015/06/16 Javascript

基于JS如何实现类似QQ好友头像hover时显示资料卡的效果(推荐)

2016/06/09 Javascript

解析利用javascript如何判断一个数为素数

2016/12/08 Javascript

基于Bootstrap table组件实现多层表头的实例代码

2017/09/07 Javascript

ES6中Array.find()和findIndex()函数的用法详解

2017/09/16 Javascript

详解vue 实例方法和数据

2017/10/23 Javascript

javascript连接mysql与php通过odbc连接任意数据库的实例

2017/12/27 Javascript

CryptoJS中AES实现前后端通用加解密技术

2018/12/18 Javascript

ES6中的迭代器、Generator函数及Generator函数的异步操作方法

2019/05/12 Javascript

JavaScript交换变量常用4种方法解析

2020/09/02 Javascript

js实现日历

2020/11/07 Javascript

Windows下安装python MySQLdb遇到的问题及解决方法

2017/03/16 Python

opencv与numpy的图像基本操作

2019/03/08 Python

python实现连连看辅助之图像识别延伸

2019/07/17 Python

Python 装饰器原理、定义与用法详解

2019/12/07 Python

Django集成celery发送异步邮件实例

2019/12/17 Python

Python之关于类变量的两种赋值区别详解

2020/03/12 Python

CSS3控制HTML元素动画效果

2014/02/08 HTML / CSS

Html5画布_动力节点Java学院整理

2017/07/13 HTML / CSS

html5中canvas图表实现柱状图的示例

2017/11/13 HTML / CSS

小学安全汇报材料

2014/08/14 职场文书

世界读书日的活动方案

2014/08/20 职场文书

详细的本科生职业生涯规划范文

2014/09/16 职场文书

预备党员思想汇报1000字

2014/10/07 职场文书

股权转让协议范本

2014/12/07 职场文书

Python pygame实现中国象棋单机版源码

2021/06/20 Python

浅谈Python数学建模之数据导入

2021/06/23 Python

Java代码规范与质量检测插件SonarLint的使用

2022/08/05 Java/Android