分析Profiler Timeline中的算子序列,通过寻找频繁项集的办法,得到TOPK可融合的算子序列

news2026/2/14 3:34:44

分析Profiler Timeline中的算子序列,通过寻找频繁项集的办法,得到TOPK可融合的算子序列

1.相关链接
2.代码【仅分析带通信算子的Pattern】
3.在实际工程中发现 ['all_gather', 'matrix_mm_out']频率最高
4.[Ascend MC2](https://gitee.com/ascend/MindSpeed/blob/master/docs/features/mc2.md)
5.torch_npu.npu_all_gather_base_mm

本文尝试分析Profiler Timeline中的算子序列,通过寻找频繁项集的办法,得到TOPK可融合的算子序列

1.相关链接

Ascend C 2.0新特性详解，支撑大模型融合算子高效开发

2.代码【仅分析带通信算子的Pattern】

from collections import defaultdict, deque

def rolling_hash(s, base=257, mod=10**9 + 7):
    h = 0
    for ch in s:
        h = (h * base + ord(ch)) % mod
    return h

def find_top_n_fixed_length_sequences(arr, length, top_n):
    # 创建一个字典来存储子序列及其出现次数和偏移位置
    sequence_data = defaultdict(lambda: {"count": 0, "positions": []})
    base, mod = 257, 10**9 + 7
    
    # 滑动窗口计算固定长度子序列
    for i in range(len(arr) - length + 1):
        window = arr[i:i + length]
        if "all_gather" in window or "reduce_scatter" in window:  #只处理函通信算子的pattern
            flat_window = ''.join(window)
            h = rolling_hash(flat_window, base, mod)
            sequence_data[h]['count'] += 1
            sequence_data[h]['positions'].append(i)
        
    # 按照出现频率排序，并获取前N个子序列
    sorted_sequences = sorted(sequence_data.items(), key=lambda item: item[1]['count'], reverse=True)
    top_sequences = sorted_sequences[:top_n]
    
    return top_sequences, sequence_data
	
# 加载profiler生成的timeline,提取出算子名列表及偏移未知,这里构造了一个简单的数据
operators=["mm","all_gather","binary_add","dropout_backward","fill","eltwise_silu","mm","all_gather","fill"]
offsets=range(0,len(operators))

# 要求最少两个元素的子序列，且取前3个出现频率最高的长度为2的子序列
length = 2
top_n = 1

# 获取前N个频繁的长度为固定长度的子序列
top_sequences, sequence_data = find_top_n_fixed_length_sequences(operators, length, top_n)

# 反向查找实际的序列值
reverse_lookup = {}
for i in range(len(operators) - length + 1):
    window = operators[i:i + length]
    flat_window = ''.join(window)
    h = rolling_hash(flat_window)
    if h not in reverse_lookup:
        reverse_lookup[h] = window

# 输出结果并去重
unique_sequences = set()  # 用来跟踪已经输出的序列
for seq_hash, data in top_sequences:
    seq = reverse_lookup[seq_hash]
    seq_tuple = tuple(seq)
    if seq_tuple not in unique_sequences:
        unique_sequences.add(seq_tuple)
        positions = sequence_data[seq_hash]['positions']
        print(f'序列: {seq}, 出现频率: {data["count"]}')
        for pos in positions:
            beg=pos
            end=pos+length
            ts_beg=offsets[beg]
            ts_end=offsets[end]
            print(ts_beg,ts_end,operators[ts_beg:ts_end])

DEMO 输出

序列: ['mm', 'all_gather'], 出现频率: 2
0 2 ['mm', 'all_gather']
6 8 ['mm', 'all_gather']

3.在实际工程中发现 [‘all_gather’, ‘matrix_mm_out’]频率最高

4.Ascend MC2

5.torch_npu.npu_all_gather_base_mm

在这里插入图片描述

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/1900421.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！

分析Profiler Timeline中的算子序列,通过寻找频繁项集的办法,得到TOPK可融合的算子序列

分析Profiler Timeline中的算子序列,通过寻找频繁项集的办法,得到TOPK可融合的算子序列

1.相关链接

2.代码【仅分析带通信算子的Pattern】

3.在实际工程中发现 [‘all_gather’, ‘matrix_mm_out’]频率最高

4.Ascend MC2

5.torch_npu.npu_all_gather_base_mm

相关文章

路径规划之基于二次规划的路径平滑Matlab代码

docker也能提权？？内网学习第6天 rsync未授权访问覆盖 sudo(cve-2021-3156)漏洞提权 polkit漏洞利用

从海上长城到数字防线：视频技术在海域边防现代化中的创新应用

【优化论】基本概念与细节

数据库之SQL（二）

2024亚太杯中文赛数学建模B题完整论文讲解（含每一问python代码＋结果＋可视化图）

【JavaWeb程序设计】JSP编程II

罗剑锋的C++实战笔记学习（一）：const、智能指针、lambda表达式

政安晨【零基础玩转各类开源AI项目】基于Ubuntu系统部署ComfyUI：功能最强大、模块化程度最高的Stable Diffusion图形用户界面和后台

2024年江苏省研究生数学建模竞赛B题人造革性能优化设计研究论文和代码

适用于 Windows的 5 个最佳 PDF 转 Word 转换器

KDTree 简单原理与实现

嵌入式系统中状态机实现详解

机器学习——岭回归

帕金森病患者在选择运动疗法时应该注意哪些事项？

【优化论】约束优化算法

量化机器人：金融市场的智能助手

SSM家庭理财个人理财系统-JAVA【数据库设计、源码、开题报告】

SQL使用join查询方式找出没有分类的电影id以及名称

ABAP 生成word文档