用python 实现采集百度热搜

news2025/7/8 14:31:29

文章目录

前言
- 用python 实现采集百度热搜
- - 1. 实现demo

前言

如果您觉得有用的话，记得给博主点个赞，评论，收藏一键三连啊，写作不易啊^ _ ^。
而且听说点赞的人每天的运气都不会太差，实在白嫖的话，那欢迎常来啊!!!

用python 实现采集百度热搜

1. 实现demo

import requests
from bs4 import BeautifulSoup
"""
GET /path HTTP/1.1 请求首行
1、请求头
2、空行
3、请求体
"""
header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/94.0.4606.54 Safari/537.36 '
}  # 添加请求头，伪装为浏览器

url1 = 'https://top.baidu.com/board?platform=pc&sa=pcindex_entry'
res = requests.get(url=url1, headers=header)
res.encoding = 'utf-8'
print(f'状态码:{res.status_code}')
# 使用Beautiful Soup解析网页内容
soup_body = BeautifulSoup(res.text, 'html.parser')

# 找到包含热搜词的HTML元素,查找HTML文档中的所有<span>标签中class属性为"title-content-title"的元素：
hot_search_items = soup_body.find_all("div", attrs={"class": "active-item_1Em2h"})
print('<<<<<<<<<<<<百度热搜>>>>>>>>>>>>>>>')
for item  in hot_search_items:
    index = item.find("div", attrs={"class": "sign-index_mtI7K"})
    value = item.find("div", attrs={"class": "c-single-text-ellipsis"})
    num = index.text
    if num is None or num.strip() == "":
        continue
    # strip() 去空格
    print(f'【{num.strip()}】,{value.text.strip()}')