1、问题背景
我们需要比较一个文本文件 F 与路径下多个其他文本文件之间的差异。我们已经编写了以下代码,但只能输出一个文件的比较结果。我们需要修改代码,以便比较所有文件并打印所有结果。
import difflib
import fnmatch
import os
filelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path):
for filename in fnmatch.filter(filenames, '*.txt'):
filelist.append(os.path.join(root, filename))
for m in filelist:
g=open(m,'r')
glines= g.readlines()
# g.close()
d = difflib.Differ()
diff_list = list(d.compare(flines, glines))
#print("".join(diff))
n_adds, n_subs, n_eqs, n_wiered = 0, 0, 0, 0
for diff_item in diff_list:
if diff_item[0] == '+':
n_adds += 1
elif diff_item[0] == '-':
n_subs +=1
elif diff_item[0] == ' ':
n_eqs += 1
else:
n_wiered += 1
print 'lines files #1: %d #2: %d' % (len(flines), len(glines))
print 'adds: %d subs: %d eqs: %d ?:%d ' % (n_adds, n_subs, n_eqs, n_wiered)
2、解决方案
方法一:
问题在于 diff_list 被每次读取的文件覆盖。我们可以修改代码,在每次读取文件时将差异添加到 diff_list 中,而不是覆盖它。
import difflib
import fnmatch
import os
filelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path):
for filename in fnmatch.filter(filenames, '*.txt'):
filelist.append(os.path.join(root, filename))
diff_list = [] # Initialize an empty list to store all differences
for m in filelist:
g=open(m,'r')
glines= g.readlines()
d = difflib.Differ()
diff_list.extend(list(d.compare(flines, glines))) # Append differences to diff_list
n_adds, n_subs, n_eqs, n_wiered = 0, 0, 0, 0
for diff_item in diff_list:
if diff_item[0] == '+':
n_adds += 1
elif diff_item[0] == '-':
n_subs +=1
elif diff_item[0] == ' ':
n_eqs += 1
else:
n_wiered += 1
print 'lines files #1: %d #2: %d' % (len(flines), len(glines))
print 'adds: %d subs: %d eqs: %d ?:%d ' % (n_adds, n_subs, n_eqs, n_wiered)
现在,代码将比较所有文件,并将所有结果打印出来。
方法二:
另一种方法是使用 filecmp.cmp 函数来比较文件。filecmp.cmp 函数接受两个文件路径作为参数,并返回一个布尔值,表示这两个文件是否相等。
import filecmp
import os
filelist=[]
f= open("D:/Desktop/data/sample/ff69c.txt")
flines= f.readlines()
path="D:/Desktop/data/sample/sample2"
for root, dirnames, filenames in os.walk(path):
for filename in fnmatch.filter(filenames, '*.txt'):
filelist.append(os.path.join(root, filename))
for file1 in filelist:
for file2 in filelist:
if filecmp.cmp(file1, file2, shallow=False):
print(f"{file1} and {file2} are equal.")
else:
print(f"{file1} and {file2} are different.")
这种方法不需要读取文件内容,因此速度更快,但它只比较文件的二进制内容,不比较文件的内容。