这里写自定义目录标题
- 1.遍历目录下所有的文件
- 1.1 这里主要利用os.walk 函数的功能
- 2. 计算文件的 MD5 值
- 3. 我们组合下两个函数,遍历下某个文件夹下的文件的md5码
1.遍历目录下所有的文件
def getFileName(directory):
file_list = []
for dir_name, sub_dir,file_name_list in os.walk(directory):
#print(dir_name,sub_dir,file_list)
if file_name_list:
for file in file_name_list:
file_path_abs = fr'{dir_name}/{file}'
file_list.append(file_path_abs)
return file_list
1.1 这里主要利用os.walk 函数的功能
我们看下os.walk的用法
In [6]: os.walk??
Signature: os.walk(top, topdown=True, onerror=None, followlinks=False)
Source:
def walk(top, topdown=True, onerror=None, followlinks=False):
"""Directory tree generator.
For each directory in the directory tree rooted at top (including top
itself, but excluding '.' and '..'), yields a 3-tuple
dirpath, dirnames, filenames
dirpath is a string, the path to the directory. dirnames is a list of
the names of the subdirectories in dirpath (excluding '.' and '..').
filenames is a list of the names of the non-directory files in dirpath.
Note that the names in the lists are just names, with no path components.
To get a full path (which begins with top) to a file or directory in
dirpath, do os.path.join(dirpath, name).
If optional arg 'topdown' is true or not specified, the triple for a
directory is generated before the triples for any of its subdirectories
(directories are generated top down). If topdown is false, the triple
for a directory is generated after the triples for all of its
subdirectories (directories are generated bottom up).
When topdown is true, the caller can modify the dirnames list in-place
(e.g., via del or slice assignment), and walk will only recurse into the
subdirectories whose names remain in dirnames; this can be used to prune the
search, or to impose a specific order of visiting. Modifying dirnames when
topdown is false has no effect on the behavior of os.walk(), since the
directories in dirnames have already been generated by the time dirnames
itself is generated. No matter the value of topdown, the list of
subdirectories is retrieved before the tuples for the directory and its
subdirectories are generated.
By default errors from the os.scandir() call are ignored. If
optional arg 'onerror' is specified, it should be a function; it
will be called with one argument, an OSError instance. It can
report the error to continue with the walk, or raise the exception
to abort the walk. Note that the filename is available as the
filename attribute of the exception object.
By default, os.walk does not follow symbolic links to subdirectories on
systems that support them. In order to get this functionality, set the
optional argument 'followlinks' to true.
Caution: if you pass a relative pathname for top, don't change the
current working directory between resumptions of walk. walk never
changes the current directory, and assumes that the client doesn't
either.
Example:
import os
from os.path import join, getsize
for root, dirs, files in os.walk('python/Lib/email'):
print(root, "consumes", end="")
print(sum(getsize(join(root, name)) for name in files), end="")
print("bytes in", len(files), "non-directory files")
if 'CVS' in dirs:
dirs.remove('CVS') # don't visit CVS directories
"""
sys.audit("os.walk", top, topdown, onerror, followlinks)
return _walk(fspath(top), topdown, onerror, followlinks)
File: c:\users\thinkpad\appdata\local\programs\python\python39\lib\os.py
Type: function
Signature
: 函数签名,与其他函数一样,函数签名是区别两个函数是否是同一个函数的唯一标志
(敲黑板面试-可能问到的重点
)。包括函数名
和函数列表
Source
: 函数的源代码
该函数的功能是目录(directory)树生成器。
以顶部为根的目录树中的每一个目录(包括本身,但不包括父目录),会生成一个三元组,(dirpath,dirnames,filenames)
dirpath
–>string
: 一个字符串,目录树的路径。
dirnames
–>list
: 是dirpath
的子目录列表(不包括"."-本身-dirpath, “…” -父目录)
filenames
–>list
: 非目录文件列表,一般这个为空表示dirpath
下全是目录,不包含文件,如果非空表示为根节点,可以确定文件的路径了。
综合起来看就是表示 dirpath
目录下包含dirnames
目录和filenames
文件
因此,遍历每个文件夹中的文件就是: filenames
不为空,即可确定文件的路径为dirpath
+filenames[x]
Type
: 函数
2. 计算文件的 MD5 值
def fileMD5(filePathAbs):
md5_tool = hashlib.md5()
with open(filePathAbs,mode='rb') as fobj:
while True:
data = fobj.read(4096)
if data:
md5_tool.update(data)
else:
break
return md5_tool.hexdigest()
这里使用hashlib
模块的md5
函数求文件的md5
码,我们先来看看md5
函数的说明
In [8]: hashlib.md5??
Signature: hashlib.md5(string=b'', *, usedforsecurity=True)
Docstring: Returns a md5 hash object; optionally initialized with a string
Type: builtin_function_or_method
Type
: 内置函数,表示这个函数一般运行很快。
这里初始化是对string=b’'求md5值,并返回一个hash类型的对象,我们看下其用法
In [15]: t = hashlib.md5()
In [16]: t
Out[16]: <md5 _hashlib.HASH object @ 0x00000205960238F0>
In [17]: dir(t)
Out[17]:
['__class__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__init_subclass__',
'__le__',
'__lt__',
'__module__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'block_size',
'copy',
'digest',
'digest_size',
'hexdigest',
'name',
'update']
In [18]: t.hexdigest()
Out[18]: 'd41d8cd98f00b204e9800998ecf8427e'
In [19]:
我们再看看update
方法
In [19]: t.update??
Signature: t.update(obj, /)
Docstring: Update this hash object's state with the provided string.
Type: builtin_function_or_method
该方法是根据提供的string更新其hash对象的值。
3. 我们组合下两个函数,遍历下某个文件夹下的文件的md5码
import os,hashlib
def getFileName(directory):
file_list = []
for dir_name, sub_dir,file_name_list in os.walk(directory):
#print(dir_name,sub_dir,file_list)
if file_name_list:
for file in file_name_list:
file_path_abs = fr'{dir_name}/{file}'
file_list.append(file_path_abs)
return file_list
def fileMD5(filePathAbs):
md5_tool = hashlib.md5()
with open(filePathAbs,mode='rb') as fobj:
while True:
data = fobj.read(4096)
if data:
md5_tool.update(data)
else:
break
return md5_tool.hexdigest()
if __name__ == '__main__':
file_list = getFileName(r"E:/Project/Support/Day01/北京")
for file in file_list:
md5 = fileMD5(file)
print(file,md5)
目录结构:
运行结果:
In [23]: run fileManager.py
E:/Project/Support/Day01/北京/a - 副本 (2).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本 (3).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本 (4).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本 (5).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本 (6).txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a - 副本.txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/a.txt da5d6d8941b3381fb7565c57db7a9ead
E:/Project/Support/Day01/北京/b - 副本 (2) - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 (2).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 (3) - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 (3).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (2).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (3).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (4).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (5).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本 (6).txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b - 副本 - 副本.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京/b.txt 84d9cfc2f395ce883a41d7ffc1bbcf4e
E:/Project/Support/Day01/北京\昌平/d - 副本 (2).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (3).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (4).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (5).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (6).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (7).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (8).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本 (9).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d - 副本.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/d.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (2).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (3).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (4).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (5).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (6).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (7).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (8).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本 (9).txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f - 副本.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\昌平/f.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\海淀/c.txt d41d8cd98f00b204e9800998ecf8427e
E:/Project/Support/Day01/北京\海淀/e.txt d41d8cd98f00b204e9800998ecf8427e
In [24]: