CSDN文章质量分查询系统【赠python爬虫、提分攻略】

CSDN文章质量分查询系统

https://www.csdn.net/qc

点击链接-----> CSDN文章质量分查询系统 <------点击链接

点击链接-----> https://www.csdn.net/qc <------点击链接

点击链接-----> CSDN文章质量分查询系统 <------点击链接

点击链接-----> https://www.csdn.net/qc <------点击链接

说明：一定要是CSDN站内博文链接

效果举例展示

作者以自己这编文章展示效果

java机器学习计算指标动态阈值-CSDN博客

CSDN个人博客平均质量分查询

内容管理---》数据---》作品数据---》博客数据（默认页签）---》博客统计数据（默认页签）

获取CSDN个人博客链接地址

方式一

文章浏览页面---》复制地址栏的地址

方式二

文章浏览页面（底部）---》分享---》复制链接

Python爬虫应用【爬质量分】

Python爬虫爬csdn个人所有文章质量分

这里以MacOS为例，Windows和Linux类似

安装python3

安装过的跳过，如果有python（python2）也行

brew install python3

安装pip3

安装过的跳过，如果有python（python2）也行

brew install pip3

安装所需的库

requests：用于发送HTTP请求
MultipartEncoder：用于构造POST请求的请求体

# windows或是没有装homebrew的操作系统可以不带--break-system-packages
pip3 install requests --break-system-packages
pip3 install requests_toolbelt --break-system-packages
pip3 install openpyxl --break-system-packages
pip3 install pandas --break-system-packages

获取所需的请求 URL 和请求标头

第一步：打开目标网页

第二步：使用开发者工具

第三步：获取请求 URL 和请求标头

点击负载找到请求参数

第四步：分析请求url，构造参数字典

url = "https://bizapi.csdn.net/blog/phoenix/console/v1/article/list"
参数：
pageSize: 20

第五步：整代码

调整下面的代码（不同时候由于csdn官方可能有更新，地址可能会有调整）

编辑文件：csdnArticleScore.py

# pip3 install pandas --break-system-packages
import json
import pandas as pd
from openpyxl import Workbook, load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
import math
import requests

# 批量获取文章信息并保存到excel
class CSDNArticleExporter:
    def __init__(self, username, cookies, Referer, page, size, filename):
        self.username = username
        self.cookies = cookies
        self.Referer = Referer
        self.size = size
        self.filename = filename
        self.page = page

    def get_articles(self):
        url = "https://blog.csdn.net/community/home-api/v1/get-business-list"
        params = {
            "page": {self.page},
            "size": {self.size},
            "businessType": "blog",
            "username": {self.username}
        }
        headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
            'Cookie': self.cookies,  # Setting the cookies string directly in headers
            'Referer': self.Referer
        }

        try:
            response = requests.get(url, params=params, headers=headers)
            response.raise_for_status()  # Raises an HTTPError if the response status code is 4XX or 5XX
            data = response.json()
            return data.get('data', {}).get('list', [])
        except requests.exceptions.HTTPError as e:
            print(f"HTTP错误: {e.response.status_code} {e.response.reason}")
        except requests.exceptions.RequestException as e:
            print(f"请求异常: {e}")
        except json.JSONDecodeError:
            print("解析JSON失败")
        return []
    
    def export_to_excel(self):
        df = pd.DataFrame(self.get_articles())
        df = df[['title', 'url', 'postTime', 'viewCount', 'collectCount', 'diggCount', 'commentCount']]
        df.columns = ['文章标题', 'URL', '发布时间', '阅读量', '收藏量', '点赞量', '评论量']
        # df.to_excel(self.filename)
        # 下面的代码会让excel每列都是合适的列宽，如达到最佳阅读效果
        # 你只用上面的保存也是可以的
        # Create a new workbook and select the active sheet
        wb = Workbook()
        sheet = wb.active
        # Write DataFrame to sheet
        for r in dataframe_to_rows(df, index=False, header=True):
            sheet.append(r)
        # Iterate over the columns and set column width to the max length in each column
        for column in sheet.columns:
            max_length = 0
            column = [cell for cell in column]
            for cell in column:
                try:
                    if len(str(cell.value)) > max_length:
                        max_length = len(cell.value)
                except:
                    pass
            adjusted_width = (max_length + 5)
            sheet.column_dimensions[column[0].column_letter].width = adjusted_width
        # Save the workbook
        wb.save(self.filename)

class ArticleScores:
    def __init__(self, filepath):
        self.filepath = filepath

    @staticmethod
    def get_article_score(article_url):
        url = "https://bizapi.csdn.net/trends/api/v1/get-article-score"
        # TODO: Replace with your actual headers
        headers = {
            "Accept": "application/json, text/plain, */*",
            "X-Ca-Key": "203930474",
            "X-Ca-Nonce": "7e4ece49-5b7d-41e0-b548-30972a3e3989",
            "X-Ca-Signature": "mXV5P9OGdBpKyv7v+OfuSmtbN66OwLg3ujL2kwGk5mw=",
            "X-Ca-Signature-Headers": "x-ca-key,x-ca-nonce",
            "X-Ca-Signed-Content-Type": "multipart/form-data",
        }
        data = {"url": article_url}
        try:
            response = requests.post(url, headers=headers, data=data)
            response.raise_for_status()  # This will raise an error for bad responses
            return response.json().get('data', {}).get('score', 'Score not found')
        except requests.RequestException as e:
            print(f"Request failed: {e}")
            return "Error fetching score"

    def get_scores_from_excel(self):
        df = pd.read_excel(self.filepath)
        urls = df['URL'].tolist()
        scores = [self.get_article_score(url) for url in urls]
        return scores

    def write_scores_to_excel(self):
        df = pd.read_excel(self.filepath)
        df['质量分'] = self.get_scores_from_excel()
        df.to_excel(self.filepath, index=False)

if __name__ == '__main__':
    total = 10     #已发文章总数量
    # TODO:调整为你自己的cookies，Referer，CSDNid, headers
    cookies = 'UN=jjk_02027; fi_id=default; log_Id_pv=******。。。'  # Simplified for brevity
    Referer = 'https://blog.csdn.net/jjk_02027?type=blog'
    CSDNid = 'jjk_02027'
    t_index = math.ceil(total/100)+1 #向上取整，半闭半开区间，开区间+1。
    # 获取文章信息
    # CSDNArticleExporter("待查询用户名", 2（分页数量，按总文章数量/100所得的分页数）,总文章数量仅为设置为全部可见的文章总数。
    # 100（最大单次查询文章数量不大于100）, 'score1.xlsx'（待保存数据的文件，需要和下面的一致）)
    for index in range(1,t_index): #文章总数
        filename = "score"+str(index)+".xlsx"
        exporter = CSDNArticleExporter(CSDNid, cookies, Referer, index, 100, filename)  # Replace with your username
        exporter.export_to_excel()
        # 批量获取质量分
        score = ArticleScores(filename)
        score.write_scores_to_excel()

第六步：运行Python爬虫

python3 csdnArticleScore.py

第七步：查询质量分文件

运行Python爬虫后会在当前目录生成excel文件：

文件示例：

我从事了10多年的java工作，是个python新手，真正被python的强大惊讶到了，从性能、便捷性方面不输java，虽然我用java也写过，性能也相差无几，但是这里还是觉得python更好用～

附件一：Python官网及教程

Python官网 https://www.python.org/

Python3教程 Python3 教程 | 菜鸟教程

附件二：Python抓分常见问题

1、macOS python3安装requests库报error: externally-managed-environment

使用Homebrew来安装requests库，而不是直接使用pip（跳过，装python库用pip3，非python库才用brew）

brew install python-requests

2、macOS pip3安装pipx报error: externally-managed-environment

pip3 install pipx --break-system-packages

3、macOS pip3安装requests报error: externally-managed-environment

pip3 install requests --break-system-packages

附件三：CSDN提高博客质量分攻略

在提高CSDN博客质量分（即提高博客的排名和曝光度）时，有几个关键的策略可以帮助你优化你的内容，从而吸引更多的读者和搜索引擎的关注。以下是一些实用的建议：

1. 内容质量

原创性：确保你的文章是原创的，避免抄袭。
深度和广度：提供有价值的信息，不仅限于表面，而是深入探讨话题。
准确性：确保所有信息都是准确无误的，避免误导读者。

2. 文章结构

清晰的标题：使用吸引人的标题，同时包含关键词。
良好的段落划分：合理使用标题（H2, H3等），使文章结构清晰。
列表和子标题：使用列表和子标题来增强可读性。

3. 关键词优化

关键词研究：使用工具如Google Keyword Planner或SEMrush来找到相关的关键词。
关键词密度：在文章中合理分布关键词，但避免过度堆砌。
元标签优化：优化文章的元描述和关键词标签。

4. 多媒体内容

图片和视频：合理使用图片和视频，增强内容的吸引力。
ALT标签：为图片添加描述性的ALT标签，这有助于SEO。

5. 外部链接

高质量链接：提供有价值的外部链接，增加文章的可信度和深度。
内部链接：链接到你的其他相关博客文章，提高页面浏览量和SEO价值。

6. 社交媒体分享

易于分享：在文章中添加社交分享按钮，鼓励读者分享你的内容。
社交媒体互动：在社交媒体上宣传你的文章，增加曝光率。

7. 定期更新和维护

定期更新：保持博客的活跃状态，定期发布新内容。
评论管理：及时回复评论，与读者互动，建立良好的社区氛围。

8. 使用SEO插件和工具

使用SEO插件：如Yoast SEO（对于WordPress用户），它可以帮助你优化内容。
分析工具：使用Google Analytics和Google Search Console来监控你的博客表现，并根据数据进行调整。

9. 用户体验优化

快速加载速度：优化图片和其他媒体文件的大小，确保网站快速加载。
移动友好性：确保你的博客在移动设备上也能良好显示。

通过实施上述策略，你可以显著提高CSDN博客的质量分，从而增加你的博客的访问量和影响力。记住，持续的努力和改进是关键。

总结

本文讲述：

1、如何查文章质量分

2、如何获取文章链接

3、使用爬虫一次性爬所有文章质量分

4、附CSDN提高博客质量分攻略

小伙伴们，快快点赞、关注、收藏吧～