pycare检测时间序列异常

news2025/1/31 3:13:07

1. 获取时间序列数据（官方文档是多特征，本文举例是单特征）

未安装pycaret的先安装：pip install --pre pycaret （也可以pip install pycaret）

2.用pycaret检测异常一般用pycaret.anomaly

pycaret.anomaly的所有功能：
assign_model,
create_model,
deploy_model,
evaluate_model,
get_config,
get_logs,
load_config,
load_model,
models,
plot_model,
predict_model,
pull,
save_config,
save_model,
set_config,
set_current_experiment,
setup,
tune_model,

from pycaret.anomaly import *
setup = setup(df, session_id = 123)

3.获取pycaret支持的所有检测异常模型

models（）

def create_model(
    model: Union[str, Any],
    fraction: float = 0.05,
    verbose: bool = True,
    fit_kwargs: Optional[dict] = None,
    experiment_custom_tags: Optional[Dict[str, Any]] = None,
    **kwargs,
):

"""
This function trains a given model from the model library. All available
models can be accessed using the ``models`` function.

model: str or scikit-learn compatible object
    ID of an model available in the model library or pass an untrained
    model object consistent with scikit-learn API. Estimators available
    in the model library (ID - Name):

    * 'abod' - Angle-base Outlier Detection
    * 'cluster' - Clustering-Based Local Outlier
    * 'cof' - Connectivity-Based Outlier Factor
    * 'histogram' - Histogram-based Outlier Detection
    * 'knn' - k-Nearest Neighbors Detector
    * 'lof' - Local Outlier Factor
    * 'svm' - One-class SVM detector
    * 'pca' - Principal Component Analysis
    * 'mcd' - Minimum Covariance Determinant
    * 'sod' - Subspace Outlier Detection
    * 'sos' - Stochastic Outlier Selection

fraction: float, default = 0.05
    The amount of contamination of the data set, i.e. the proportion of
    outliers in the data set. Used when fitting to define the threshold on
    the decision function.


verbose: bool, default = True
    Status update is not printed when verbose is set to False.


experiment_custom_tags: dict, default = None
    Dictionary of tag_name: String -> value: (String, but will be string-ified
    if not) passed to the mlflow.set_tags to add new custom tags for the experiment.


fit_kwargs: dict, default = {} (empty dict)
    Dictionary of arguments passed to the fit method of the model.


**kwargs:
    Additional keyword arguments to pass to the estimator.


Returns:
    Trained Model

"""

4. 使用其中一个算法测试（使用的算法参数都是默认的，调参的话效果会更好）

import plotly.express as px
def get_line(df_p):
#     print(df_test)
    fig = px.line(df_p, x="date", y="y", color='mark', title="Sorted Input")
    fig.update_xaxes(tickformat="%Y-%m-%d", hoverformat="%Y-%m-%d", row=1)
    fig.update_traces(dict(mode="markers+lines"))
    fig.update_layout(title_text="趋势图", title_x=0.5)
    fig.update(layout=dict(xaxis=dict(title="日期", tickangle=-45,
                                      showline=True, nticks=20),
                           yaxis=dict(title="值", showline=True)))
    return fig
def mark(df):
    df['date'] = df.index
    df = df.reset_index(drop=True)
    df_start_time = str(df.date.min()).split(' ')[0]
    df_end_time = str(df.date.max()).split(' ')[0]
    df_end_time_next = (datetime.datetime.strptime(df_end_time, '%Y-%m-%d') + datetime.timedelta(days=1)).strftime('%Y-%m-%d')
    date_range = pd.date_range(df_start_time, df_end_time_next, freq='D')
    df_raw = pd.DataFrame()
    df_outliers = pd.DataFrame()
    df_raw["date"] = date_range
    df_outliers["date"] = date_range
    df_outliers_is = df[df['Anomaly']==1]
    df_outliers = pd.merge(df_outliers, df_outliers_is[['date','y']], on=["date"], how="left")
    df_outliers.columns = ['date','y']
    df_outliers['mark'] = 'outliers'
    df_outliers_no = df[df['Anomaly']==0]
    df_raw = pd.merge(df_raw, df_outliers_no[['date','y']], on=["date"], how="left")
    df_raw.columns = ['date','y']
    df_raw['mark'] = 'true'
    df_all = pd.concat([df_raw, df_outliers])
    return df_all

iforest = create_model('iforest') # 默认异常值比例为0.05
iforest_results = assign_model(iforest)
iforest_results = mark(iforest_results)
iforest_results
get_line(iforest_results)

下一个算法：

knn = create_model('knn')
knn_results = assign_model(knn)
knn_results = mark(knn_results)
get_line(knn_results)

下一个算法：

lof = create_model('lof')
lof_results = assign_model(lof)
lof_results = mark(lof_results)
get_line(lof_results)

下一个算法：

svm = create_model('svm')
svm_results = assign_model(svm)
svm_results = mark(svm_results)
get_line(svm_results)

下一个算法：

pca = create_model('pca')
pca_results = assign_model(pca)
pca_results = mark(pca_results)
get_line(pca_results)

下一个算法：

sos = create_model('sos')
sos_results = assign_model(sos)
sos_results = mark(sos_results)
get_line(sos_results)

下一个算法：

sod = create_model('sod')
sod_results = assign_model(sod)
sod_results = mark(sod_results)
get_line(sod_results)

下一个算法：

mcd = create_model('mcd')
mcd_results = assign_model(mcd)
mcd_results = mark(mcd_results)
get_line(mcd_results)

下一个算法：

histogram = create_model('histogram')
histogram_results = assign_model(histogram)
histogram_results = mark(histogram_results)
get_line(histogram_results)

下一个算法：

abod = create_model('abod')
abod_results = assign_model(abod)
abod_results = mark(abod_results)
get_line(abod_results)

下一个算法：

cluster = create_model('cluster')
cluster_results = assign_model(cluster)
cluster_results = mark(cluster_results)
get_line(cluster_results)

下一个算法：

cof = create_model('cof')
cof_results = assign_model(cof)
cof_results = mark(cof_results)
get_line(cof_results)

5.PyCaret 中为算法使用evaluate_model() 函数，查看流程

evaluate_model(iforest)

6.PyCaret 中为算法使用plot_model() 函数，它将为异常值创建一个 3D 图，在其中我们可以看到为什么某些特征被视为异常。

plot_model(iforest)

plot_model(knn)

我们可以放大这个二维图来查看哪些点被认为是异常值。

可以再次为配对图创建另一个视觉效果，现在使用异常来查看哪些点将被视为异常。

import seaborn as sns
iforest = create_model('iforest')
iforest_results = assign_model(iforest)
sns.pairplot(iforest_results, hue = "Anomaly")

最后，我们可以保存模型。可以保存任何合适的模型。这里我们保存了 iforest 模型。

save_model(iforest,'IForest_Model')

先这样，有时间再完善吧

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/46308.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！

pycare检测时间序列异常

相关文章

七、【React-Router5】嵌套路由

SparkMlib 之决策树及其案例

信号类型（雷达）——连续波雷达（二）

【AI学习笔记】TensorFlow GPU版本的安装（超详细）

【计算机】可信平台模块Trusted Platform Module - TPM

速锐得解码理想汽车L8方向盘转向角度应用随动大灯照明升级

PCL 点云的法向量

【即将开源】⽤于3D激光雷达SLAM闭环检测的词袋模型BoW3D

C++基础语法

A*算法-Python实现

小程序数据请求的方式和注意事项

Mysql 索引基数与选择性

微信小程序中基础入门

Redis——》数据类型：List（列表）

Linux02——操作系统接口

PyCharm中鼠标悬停在函数上时显示函数和帮助

Redis高级篇——持久化

蓝海创意云接受【看苏州】独家专访：助力苏州数字文化行业全方位发展

# Vue 中 JSON 编辑器使用

【CSS3】重点知识内容，快速上手