零基础学习Python（八）—— time模块、request模块、数据分析和自动化办公相关模块、jieba模块、文件操作和os相关模块的简单介绍

1. time模块

time()：获取当前时间戳，是一个数字

localtime()：返回一个time.struct_time对象，里面有年月日时分秒，还有星期几（0表示星期一）和今年的第几天

import time

print(time.time())
print(time.localtime())

1725287068.253736
time.struct_time(tm_year=2024, tm_mon=9, tm_mday=2, tm_hour=22, tm_min=24, tm_sec=28, tm_wday=0, tm_yday=246, tm_isdst=0)

localtime()也可以传入参数，表示1970年1月1日8时（本地时间）0分0秒开始经过的秒数

print(time.localtime(60))

time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=8, tm_min=1, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)

ctime()：返回简单易读的时间字符串

print(time.ctime())

Mon Sep  2 22:29:40 2024

strftime()：将时间struct_time对象转换为格式化的字符串

print(time.strftime('%Y-%m-%d %H:%M:%S', time.localtime()))

2024-09-02 22:33:14

注意年是大写的%Y，时分秒也都是大写的：%H，%M，%S。

strptime()：将字符串转换为struct_time对象

print(time.strptime('2008-08-08 20:08:08', '%Y-%m-%d %H:%M:%S'))

time.struct_time(tm_year=2008, tm_mon=8, tm_mday=8, tm_hour=20, tm_min=8, tm_sec=8, tm_wday=4, tm_yday=221, tm_isdst=-1)

sleep()：程序睡眠指定的秒数

2. datetime模块

datetime.now()：获取当前日期时间，精确到微秒

# 从datetime模块中导入datetime类
from datetime import datetime

print(datetime.now())

2024-09-02 22:40:08.851771

datetime类的构造参数可以传入年月日时分秒：

dt = datetime(2024, 9, 2, 22, 40, 0)
print(dt)
print(type(dt))

2024-09-02 22:40:00
<class 'datetime.datetime'>

取出datetime对象的年月日时分秒信息：

print(dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second)

2024 9 2 22 40 0

比较两个datetime对象的大小：

dt1 = datetime(2024, 5, 1, 0, 0, 0)
dt2 = datetime(2024, 10, 1, 0, 0, 0)
print(dt1 < dt2)

True

datetime对象与字符串之间的转换（与time模块类似，都是strftime和strptime这两个方法）：

print(datetime.strftime(datetime.now(), '%Y-%m-%d %H:%M:%S'))
print(datetime.strptime('2024-09-02 22:40:00', '%Y-%m-%d %H:%M:%S'))

计算两个datetime对象之间的差值，发现是一个timedelta对象：

dt1 = datetime(2024, 5, 1, 0, 0, 0)
dt2 = datetime(2024, 10, 1, 0, 0, 0)
print(type(dt2 - dt1))
print(dt2 - dt1)

<class 'datetime.timedelta'>
153 days, 0:00:00

还可以将datetime对象加上或者减去一个timedelta对象，得到另一个datetime对象：

from datetime import timedelta

print(dt1 + timedelta(153))

2024-10-01 00:00:00

timedelta构造函数第一个入参是天数，第二个入参是秒数：

print(dt1 + timedelta(153, 10))

2024-10-01 00:00:10

3. request模块

request库是python处理HTTP请求的库，使用request库的get()方法可以获取一个网络请求的响应，可以通过响应对象的text属性来获取响应中的字符串数据，通过响应对象的content属性来获取响应中的二进制数据（图片、音频、视频等）。

打印发现乱码，设置响应的编码格式为utf-8：

4. openpyxl模块

openpyxl模块专门用来处理Excel文件，可以对Excel文件进行写入和读取。

写入Excel文件：

import openpyxl

# 创建工作簿
wb = openpyxl.Workbook()
# 创建页签，页签名和页签索引
sheet = wb.create_sheet("sheet1", 0)
# 添加数据
sheet.append([1, 2, 'a', True])
# 保存文件，传入文件名
wb.save('text.xlsx')

读取Excel文件（注意每个单元格的值通过value属性来获取）：

# 打开表格
wb = openpyxl.load_workbook("text.xlsx")

# 选择sheet1页签
sheet = wb['sheet1']

# 读取文件
l = []
for row in sheet.rows:
    c = []
    for cell in row:
        c.append(cell.value)
    l.append(c)

print(l)

5. pdfplumber模块

pdfplumber模块用于读取pdf文件。

import pdfplumber

with pdfplumber.open("可靠的底部形态.pdf") as f:
    for p in f.pages:
        # 提取内容
        print(p.extract_text())
        print(f"第{p.page_number}页提取结束")

可以通过pdfplumber模块实现提取指定的数据，或者将pdf倒序等功能。

6. numpy模块和matplotlib模块

使用matplotlib模块读取图片：

import matplotlib.pyplot as plt

im = plt.imread("日出海.jpeg")
print(im)
print(type(im))

plt读取出的image是一个三维数组，类型为numpy数组，前两维是图片的宽和高，第三维是RGB数组。使用numpy模块对该图像进行灰度处理：

import numpy as np
import matplotlib.pyplot as plt

im = plt.imread("日出海.jpeg")
plt.imshow(im)

# 灰度固定值
mask = np.array([0.299, 0.587, 0.114])
im2 = np.dot(im, mask)
plt.imshow(im2, cmap='gray')

7. pandas模块

使用pandas模块读取表格数据，然后使用matplotlib模块绘制饼图

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel("test.xlsx")

# 解决中文乱码
plt.rcParams['font.sans-serif'] = ['SimHei']

#设置画布的大小
plt.figure(figsize=(10, 6))
labels = df['商品名称']
y = df['北京']

# 绘制饼图
plt.pie(y, labels=labels, autopct='%1.1f%%', startangle=90)

# 设置x,y轴刻度相等
plt.axis('equal')
plt.title('xxx')

plt.show()

8. jieba模块

import jieba

with open('text.txt', 'r', encoding='utf-8') as f:
    s = f.read()

# 对原始字符串进行分词，返回一个列表
l = jieba.lcut(s)

# 去重
s = set(l)

#统计词的次数
d = {}
for word in s:
    if len(word) >= 2:
        if word in d:
            d[word] = d[word] + 1
        else:
            d[word] = 1

res = []
for k,v in d.items():
    res.append([k, v])

# 按照词频从大到小排序，并取前10项
res.sort(key=lambda x: x[1], reverse=True)
print(res[0:11])

9. 文件操作

文件基本操作在前面的博客中介绍过，这里复习下文件的打开模式和读写方法：

注意，writelines方法不换行，而且传入的参数是列表（只传入一个字符串也可以），并且列表中的元素只能为字符串，否则会报错。

使用文件操作实现复制文件的功能：

def func_copy(source_file, target_file):
    with open(source_file, 'r') as sf:
        with open(target_file, 'w') as tf:
            tf.write(sf.read())

func_copy('text.txt', 'text2.txt')

10. json模块

json.dumps方法使用：

l = [{'name': 'zhangsan', 'age': 18}, {'name': "lisi", 'age': 20}, {'name': 'wangwu'}]

# 将Python数据类型转换为json字符串，这里是一个json数组，ensure_ascii=False表示正常显示中文，indent=4用于缩进格式
s = json.dumps(l, ensure_ascii=False, indent=4)
print(type(s))
print(s)

<class 'str'>
[
    {
        "name": "zhangsan",
        "age": 18
    },
    {
        "name": "lisi",
        "age": 20
    },
    {
        "name": "wangwu"
    }
]

json.loads方法使用：

# 将刚才的son字符串加载为python数据类型，这里是一个列表类型
l2 = json.loads(s)
print(type(l2))
print(l2)

ds = '{"name": "aaa", "age": 12}'
# 将son字符串加载为python数据类型，这里是一个字典类型
d = json.loads(ds)
print(type(d))
print(d)

<class 'dict'>
{'name': 'aaa', 'age': 12}

注意，json.loads方法中的json字符串中的引号必须是双引号，否则无法解析成Python中的数据类型。

json.dump方法使用：

import json

l = [{'name': 'zhangsan', 'age': 18}, {'name': "lisi", 'age': 20}, {'name': 'wangwu'}]

# 将Python数据类型转换为json字符串，并存于文件中
with open('json.txt', 'w') as f:
    json.dump(l, f, ensure_ascii=False, indent=4)

json.load方法使用：

with open('json.txt', 'r') as f:
    s = json.load(f)
    print(type(s))
    print(s)

<class 'list'>
[{'name': 'zhangsan', 'age': 18}, {'name': 'lisi', 'age': 20}, {'name': 'wangwu'}]

11. os模块

getcwd()：获取当前工作路径

import os

print(os.getcwd())

/Users/admin/Documents/pythonProject

listdir()：获取指定目录下的所有目录和文件，如果不传参，则默认为当前工作路径，返回结果为列表

print(os.listdir())

['text.xlsx', '日出海.jpeg', 'json.txt', 'gray.jpeg', 'text2.txt', 'text.txt', 'venv', 'main.py', '可靠的底部形态.pdf', '.idea']

mkdir()：创建单级目录，如果目录已存在，则报错

os.mkdir("study")

makedirs()：创建多级目录，如果目录已存在，则报错

os.makedirs("study/aa/bb/cc")

rmdir()：删除目录，如果目录不为空或者不存在，则报错

os.rmdir("study/aa/bb/cc")

removedirs()：删除多级目录，如果目录不为空或者不存在，则报错，注意这里要求的目录为空包括父目录也要为空

os.removedirs("study/aa/bb/cc")

walk()：遍历目录树，参数不能传空，遍历结果是元组包含三个元素：当前遍历的目录，当前遍历的目录下包含的目录列表，当前遍历的目录下包含的文件列表

for dirs, dirlist, filelist in os.walk("./"):
    print(dirs)
    print(dirlist)
    print(filelist)
    print("------------")

remove()：删除文件，如果文件不存在，则报错

os.remove('json.txt')

rename()：重命名文件

os.rename('text.txt', 'test.txt')

stat()：获取文件的详细信息

info = os.stat("text.xlsx")
print(type(info))
print(info)

<class 'os.stat_result'>
os.stat_result(st_mode=33188, st_ino=31111965, st_dev=16777233, st_nlink=1, st_uid=501, st_gid=20, st_size=5265, st_atime=1725375194, st_mtime=1725375193, st_ctime=1725375193)

其中，st_ctime表示文件创建时间，st_mtime表示文件的修改时间，st_atime表示文件的最近一次访问时间，st_size表示文件大小（单位是字节）。

11. os.path模块

abspath()：获取目录或者文件的绝对路径

import os.path as path

print(path.abspath("text.xlsx"))

/Users/admin/Documents/pythonProject/text.xlsx

exists()：判断目录或者文件是否存在

print(path.exists("text.xlsx"))
print(path.exists("text2.xlsx"))

True
False

join()：将第一个参数path和第二个参数文件名进行拼接，返回字符串，不判断文件是否存在

f = path.join('study', 'text.xlsx')
print(type(f))
print(f)

<class 'str'>
study/text.xlsx

splittext()：分割文件名和后缀名，返回格式为元组，不判断文件是否存在

print(path.splitext('text.txt'))

('text', '.txt')

basename()：获取文件名，不包含目录名，但是包含后缀名，不判断文件是否存在

print(path.basename("study/text.txt"))

text.txt

dirname()：获取目录名，不判断文件是否存在

print(path.dirname("study/text.txt"))

study

isdir()：是否是有效的目录，如果不是目录或者目录不存在，则返回False

print(path.isdir("study/day"))
print(path.isdir("text.xlsx"))
print(path.isdir("./"))

False
False
True

isfile()：是否是有效的文件，如果不是文件或者文件不存在，则返回False

print(path.isfile("study/text.xlsx"))
print(path.isfile("text.xlsx"))
print(path.isfile("./"))

False
True
False