使用多种机器学习模型进行情感分析

news2026/2/13 13:29:51

使用 TF-IDF 与贝叶斯分类器进行情感分析是一个常见且有效的组合，特别是在文本分类任务中。贝叶斯分类器（通常是朴素贝叶斯分类器）等机器学习模型具有计算简单、效率高的优点，且在文本分类任务中表现良好。接下来，我将详细讨论结合 TF-IDF 和贝叶斯分类器等机器学习模型进行情感分析的实现步骤。

一、使用多种机器学习模型进行情感分析

1.数据准备与加载

我们首先准备好训练数据集。这里假设我们仍使用与之前相同的样本数据，不同的是添加了数据标签以便训练：

import pandas as pd

data = {
    'Text': [
        "I am very happy with the service",
        "This is terrible, I hate it",
        "What a wonderful experience!",
        "I am so disappointed",
        "Absolutely fantastic! Highly recommend it",
        "Worst experience ever, very sad",
        "I love this product, it’s amazing",
        "This is the best thing I have ever bought",
        "I regret buying this item, very dissatisfied",
        "The quality is poor, I’m upset",
        "Excellent service and very satisfied",
        "Not worth the money, very bad experience",
        "I’m thrilled with the results, highly recommended",
        "This is not what I expected, I feel cheated",
        "Wonderful product, exceeded my expectations",
        "I am frustrated and unhappy with this purchase",
        "Very pleased with the performance, good value",
        "The experience was awful, never buying again",
        "Great quality and excellent service",
        "This is disappointing, I feel let down"
    ],
    'Label': [
        'positive', 'negative', 'positive', 'negative', 'positive', 'negative',
        'positive', 'positive', 'negative', 'negative', 'positive', 'negative',
        'positive', 'negative', 'positive', 'negative', 'positive', 'negative',
        'positive', 'negative'
    ]
}

df = pd.DataFrame(data)

2.文本预处理与特征提取（TF-IDF）

使用 TfidfVectorizer 提取文本特征，并将数据集划分为训练集和测试集：

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split

# 文本转为TF-IDF特征
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['Text'])

# 标签
y = df['Label']

# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

3.构建多种分类器

我们将使用以下几种常见的分类器：

朴素贝叶斯（Naive Bayes）
逻辑回归（Logistic Regression）
支持向量机（SVM）
随机森林（Random Forest）

from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, accuracy_score

# 初始化模型
models = {
    "Naive Bayes": MultinomialNB(),
    "Logistic Regression": LogisticRegression(),
    "SVM": SVC(),
    "Random Forest": RandomForestClassifier()
}

# 训练和评估模型
for model_name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(f"Model: {model_name}")
    print("Accuracy:", accuracy_score(y_test, y_pred))
    print("Classification Report:\n", classification_report(y_test, y_pred))
    print("="*50)