文件夹中有很多txt文本文件,要转换成word文件,且要批量设置一些文件格式,方便后续的打印。
文本文件如下:
在chatgpt中输入提示词:
你是一个Python编程专家,要完成一个处理word内容的任务,具体步骤如下:
打开文件夹:D:\lexfridman-podtext;
将里面所有的txt文本文件转换为word文件;
删除word文件中所有的空白段落:如果一个段落后面紧跟着另一个空白段落,那么就删除这个空白段落;
在word文件页眉的左上角添加页码;
word文件的页边距设置为上:1cm,下:1cm,左:1cm,右:1cm;
word文件的行距设为单倍行距;
word文件段落之间的间距段前设为0行,段后设为0磅;
Word文件中的字体设为Cambria字体,字号设为10号;
注意:每一步都要输出相关信息;
在添加页码时,使用set_attr函数来设置页码,而不是set;
添加页码时使用docx库的内置方法,而不是直接操作XML;
使用'utf-8'编码来读取文件;
使用 Document.oxml.OxmlElement('w:fldSimple');
先从 docx 导入 oxml;
在添加页码时,插入页码域;
很快生成源代码:
import os
import chardet
from docx import Document
from docx.shared import Pt, Cm
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
from docx.enum.section import WD_SECTION
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
# 文件夹路径
folder_path = "D:\\lexfridman-podtext"
# 遍历文件夹中的所有文件
for filename in os.listdir(folder_path):
# 检查是否是txt文件
if filename.endswith(".txt"):
print(f"处理文件:{filename}")
# 检测文件编码
with open(os.path.join(folder_path, filename), "rb") as file:
result = chardet.detect(file.read())
# 打开txt文件并读取内容
with open(os.path.join(folder_path, filename), "r", encoding=result["encoding"]) as file:
lines = file.readlines()
# 创建一个新的Word文档
doc = Document()
# 将txt文件的内容添加到Word文档
for line in lines:
doc.add_paragraph(line.strip())
# 删除空白段落
print("删除空白段落...")
paragraphs = doc.paragraphs
i = 0
while i < len(paragraphs) - 1:
if len(paragraphs[i].text.strip()) == 0 and len(paragraphs[i + 1].text.strip()) == 0:
p = paragraphs[i]._element
p.getparent().remove(p)
paragraphs = doc.paragraphs
continue
i += 1
# 添加页码
print("添加页码...")
header = doc.sections[0].header
paragraph = header.paragraphs[0]
paragraph.alignment = WD_PARAGRAPH_ALIGNMENT.LEFT
run = paragraph.add_run()
run.font.size = Pt(12)
fld_code = 'PAGE \\* MERGEFORMAT'
ctr = OxmlElement('w:r')
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve')
instrText.text = fld_code
fldChar1 = OxmlElement('w:fldChar')
fldChar1.set(qn('w:fldCharType'), 'begin')
ctr.append(fldChar1)
ctr.append(instrText)
fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'end')
ctr.append(fldChar2)
paragraph._p.append(ctr)
# 设置页边距
print("设置页边距...")
section = doc.sections[0]
section.top_margin = Cm(1)
section.bottom_margin = Cm(1)
section.left_margin = Cm(1)
section.right_margin = Cm(1)
# 设置行距和段落间距
print("设置行距和段落间距...")
for paragraph in doc.paragraphs:
paragraph_format = paragraph.paragraph_format
paragraph_format.line_spacing = 1 # 单倍行距
paragraph_format.space_before = Pt(0) # 段前间距设为0行
paragraph_format.space_after = Pt(0) # 段后间距设为0磅
# 设置字体和字号
print("设置字体和字号...")
for paragraph in doc.paragraphs:
for run in paragraph.runs:
run.font.name = 'Cambria'
run._element.rPr.rFonts.set(qn('w:eastAsia'), 'Cambria')
run.font.size = Pt(10)
# 保存Word文档
doc.save(os.path.join(folder_path, filename.replace(".txt", ".docx")))
print(f"处理并保存文件:{filename.replace('.txt', '.docx')}")
print("所有文件处理完毕。")
运行程序后,文档内容如下: