【深度学习框架-Paddle】丝滑安装PaddlePaddle,无缝衔接使用多卡

news2024/9/29 17:38:53

目录

  • Paddle爱恨史
  • PaddleCloud
  • 多卡

Paddle爱恨史

Paddle是由百度开发的国内的深度学习框架,PaddlePaddle支撑了PaddleOCR、PaddleNLP等一系列领域内的开源工具包,为国内深度学习的落地与实践做出了大量贡献。
但是,PaddlePaddle安装问题一直都困扰着我,什么````C++```报错了、什么不能使用多卡了,不同Linux环境安装后报错也各不相同。。。诸多限制,让我对它又渐渐疏远。怎么样,才能让Paddle安装像torch那么丝滑,开箱即用,而不是陷入各种报错当中,在不断摸索的过程中,也渐渐看到了方向。

PaddleCloud

先放上链接:https://hub.docker.com/r/paddlecloud/paddlenlp
某一天,在PaddleNLP文档上查看资料,看到PaddleCloud开源了基于Paddle的镜像,可开箱即用。

PaddleCloud主要用于存储飞桨模型套件PaddleNLP的标准镜像,方便模型套件用户进行Docker化部署或在云上部署。

然后我立刻尝试,将镜像拉取到linux服务器上,

docker pull paddlecloud/paddlenlp:develop-gpu-cuda11.2-cudnn8-latest

接下来就是创建容器,

docker run -itd --name container_name -v /path:/path paddlecloud/paddlenlp:develop-gpu-cuda11.2-cudnn8-latest /bin/bash

进入容器

docker exec -it container_name /bin/bash

检查PaddlePaddle框架是否正常

python
>>import paddle
>>paddle.utils.run_check()
>Running verify PaddlePaddle program ... 
W0130 06:01:35.244894    23 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.2
W0130 06:01:35.276093    23 gpu_context.cc:306] device: 0, cuDNN Version: 8.1.
PaddlePaddle works well on 1 GPU.
W0130 06:01:44.027418    23 parallel_executor.cc:642] Cannot enable P2P access from 0 to 1
W0130 06:01:44.027439    23 parallel_executor.cc:642] Cannot enable P2P access from 0 to 2
W0130 06:01:44.027443    23 parallel_executor.cc:642] Cannot enable P2P access from 0 to 3
W0130 06:01:44.027446    23 parallel_executor.cc:642] Cannot enable P2P access from 0 to 4
W0130 06:01:44.027449    23 parallel_executor.cc:642] Cannot enable P2P access from 0 to 5
W0130 06:01:44.027452    23 parallel_executor.cc:642] Cannot enable P2P access from 0 to 6
W0130 06:01:44.027456    23 parallel_executor.cc:642] Cannot enable P2P access from 0 to 7
W0130 06:01:44.027458    23 parallel_executor.cc:642] Cannot enable P2P access from 1 to 0
W0130 06:01:44.027462    23 parallel_executor.cc:642] Cannot enable P2P access from 1 to 2
W0130 06:01:44.027464    23 parallel_executor.cc:642] Cannot enable P2P access from 1 to 3
W0130 06:01:44.027467    23 parallel_executor.cc:642] Cannot enable P2P access from 1 to 4
W0130 06:01:44.027469    23 parallel_executor.cc:642] Cannot enable P2P access from 1 to 5
W0130 06:01:44.027472    23 parallel_executor.cc:642] Cannot enable P2P access from 1 to 6
W0130 06:01:44.027477    23 parallel_executor.cc:642] Cannot enable P2P access from 1 to 7
W0130 06:01:44.027480    23 parallel_executor.cc:642] Cannot enable P2P access from 2 to 0
W0130 06:01:44.027523    23 parallel_executor.cc:642] Cannot enable P2P access from 2 to 1
W0130 06:01:44.027529    23 parallel_executor.cc:642] Cannot enable P2P access from 2 to 3
W0130 06:01:44.027530    23 parallel_executor.cc:642] Cannot enable P2P access from 2 to 4
W0130 06:01:44.027534    23 parallel_executor.cc:642] Cannot enable P2P access from 2 to 5
W0130 06:01:44.027536    23 parallel_executor.cc:642] Cannot enable P2P access from 2 to 6
W0130 06:01:44.027541    23 parallel_executor.cc:642] Cannot enable P2P access from 2 to 7
W0130 06:01:44.027544    23 parallel_executor.cc:642] Cannot enable P2P access from 3 to 0
W0130 06:01:44.027549    23 parallel_executor.cc:642] Cannot enable P2P access from 3 to 1
W0130 06:01:44.027554    23 parallel_executor.cc:642] Cannot enable P2P access from 3 to 2
W0130 06:01:44.027556    23 parallel_executor.cc:642] Cannot enable P2P access from 3 to 4
W0130 06:01:44.027559    23 parallel_executor.cc:642] Cannot enable P2P access from 3 to 5
W0130 06:01:44.027611    23 parallel_executor.cc:642] Cannot enable P2P access from 3 to 6
W0130 06:01:44.027614    23 parallel_executor.cc:642] Cannot enable P2P access from 3 to 7
W0130 06:01:44.027617    23 parallel_executor.cc:642] Cannot enable P2P access from 4 to 0
W0130 06:01:44.027621    23 parallel_executor.cc:642] Cannot enable P2P access from 4 to 1
W0130 06:01:44.027624    23 parallel_executor.cc:642] Cannot enable P2P access from 4 to 2
W0130 06:01:44.027627    23 parallel_executor.cc:642] Cannot enable P2P access from 4 to 3
W0130 06:01:44.027629    23 parallel_executor.cc:642] Cannot enable P2P access from 4 to 5
W0130 06:01:44.027632    23 parallel_executor.cc:642] Cannot enable P2P access from 4 to 6
W0130 06:01:44.027635    23 parallel_executor.cc:642] Cannot enable P2P access from 4 to 7
W0130 06:01:44.027638    23 parallel_executor.cc:642] Cannot enable P2P access from 5 to 0
W0130 06:01:44.027640    23 parallel_executor.cc:642] Cannot enable P2P access from 5 to 1
W0130 06:01:44.027643    23 parallel_executor.cc:642] Cannot enable P2P access from 5 to 2
W0130 06:01:44.027647    23 parallel_executor.cc:642] Cannot enable P2P access from 5 to 3
W0130 06:01:44.027649    23 parallel_executor.cc:642] Cannot enable P2P access from 5 to 4
W0130 06:01:44.027652    23 parallel_executor.cc:642] Cannot enable P2P access from 5 to 6
W0130 06:01:44.027655    23 parallel_executor.cc:642] Cannot enable P2P access from 5 to 7
W0130 06:01:44.027696    23 parallel_executor.cc:642] Cannot enable P2P access from 6 to 0
W0130 06:01:44.027699    23 parallel_executor.cc:642] Cannot enable P2P access from 6 to 1
W0130 06:01:44.027704    23 parallel_executor.cc:642] Cannot enable P2P access from 6 to 2
W0130 06:01:44.027707    23 parallel_executor.cc:642] Cannot enable P2P access from 6 to 3
W0130 06:01:44.027712    23 parallel_executor.cc:642] Cannot enable P2P access from 6 to 4
W0130 06:01:44.027717    23 parallel_executor.cc:642] Cannot enable P2P access from 6 to 5
W0130 06:01:44.027720    23 parallel_executor.cc:642] Cannot enable P2P access from 6 to 7
W0130 06:01:44.027724    23 parallel_executor.cc:642] Cannot enable P2P access from 7 to 0
W0130 06:01:44.027727    23 parallel_executor.cc:642] Cannot enable P2P access from 7 to 1
W0130 06:01:44.027730    23 parallel_executor.cc:642] Cannot enable P2P access from 7 to 2
W0130 06:01:44.027736    23 parallel_executor.cc:642] Cannot enable P2P access from 7 to 3
W0130 06:01:44.027740    23 parallel_executor.cc:642] Cannot enable P2P access from 7 to 4
W0130 06:01:44.027752    23 parallel_executor.cc:642] Cannot enable P2P access from 7 to 5
W0130 06:01:44.027757    23 parallel_executor.cc:642] Cannot enable P2P access from 7 to 6
WARNING:root:PaddlePaddle meets some problem with 8 GPUs. This may be caused by:
 1. There is not enough GPUs visible on your system
 2. Some GPUs are occupied by other process now
 3. NVIDIA-NCCL2 is not installed correctly on your system. Please follow instruction on https://github.com/NVIDIA/nccl-tests 
 to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
WARNING:root:
 Original Error is: (External) NCCL error(2), unhandled system error. 
  [Hint: 'ncclSystemError'. A call to the system failed.] (at /paddle/paddle/fluid/platform/device/gpu/nccl_helper.h:155)

PaddlePaddle is installed successfully ONLY for single GPU! Let's start deep learning with PaddlePaddle now.

出现了上面的结果,说明安装成功,但是只能使用单卡,虽然不能使用多卡,但是勉强用着吧,

多卡

目前深度学习训练过程,一般2张起步,对于PaddlePaddle不能使用多卡,还是耿耿于怀。经过一番查询之后,发现是NCCL出了问题。怎么解决,参考不少资料。最终发现了问题所在,
解决链接:
https://github.com/pytorch/pytorch/issues/73775
在这里插入图片描述
因此,删掉之前创建的容器,重新创建。

docker run -itd --name container_name -v /path:/path  -v /dev/shm/:/dev/shm paddlecloud/paddlenlp:develop-gpu-cuda11.2-cudnn8-latest /bin/bash

进入容器后,检查Paddle是否正常

>>paddle.utils.run_check()
Running verify PaddlePaddle program ... 
W0130 06:10:52.232132    22 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.7, Runtime API Version: 11.2
W0130 06:10:52.234642    22 gpu_context.cc:306] device: 0, cuDNN Version: 8.1.
PaddlePaddle works well on 1 GPU.
W0130 06:10:54.919947    22 parallel_executor.cc:642] Cannot enable P2P access from 0 to 1
W0130 06:10:54.919976    22 parallel_executor.cc:642] Cannot enable P2P access from 0 to 2
W0130 06:10:54.919981    22 parallel_executor.cc:642] Cannot enable P2P access from 0 to 3
W0130 06:10:54.919983    22 parallel_executor.cc:642] Cannot enable P2P access from 0 to 4
W0130 06:10:54.919986    22 parallel_executor.cc:642] Cannot enable P2P access from 0 to 5
W0130 06:10:54.919989    22 parallel_executor.cc:642] Cannot enable P2P access from 0 to 6
W0130 06:10:54.919992    22 parallel_executor.cc:642] Cannot enable P2P access from 0 to 7
W0130 06:10:54.919996    22 parallel_executor.cc:642] Cannot enable P2P access from 1 to 0
W0130 06:10:54.919998    22 parallel_executor.cc:642] Cannot enable P2P access from 1 to 2
W0130 06:10:54.920001    22 parallel_executor.cc:642] Cannot enable P2P access from 1 to 3
W0130 06:10:54.920003    22 parallel_executor.cc:642] Cannot enable P2P access from 1 to 4
W0130 06:10:54.920009    22 parallel_executor.cc:642] Cannot enable P2P access from 1 to 5
W0130 06:10:54.920012    22 parallel_executor.cc:642] Cannot enable P2P access from 1 to 6
W0130 06:10:54.920019    22 parallel_executor.cc:642] Cannot enable P2P access from 1 to 7
W0130 06:10:54.920022    22 parallel_executor.cc:642] Cannot enable P2P access from 2 to 0
W0130 06:10:54.920027    22 parallel_executor.cc:642] Cannot enable P2P access from 2 to 1
W0130 06:10:54.920029    22 parallel_executor.cc:642] Cannot enable P2P access from 2 to 3
W0130 06:10:54.920037    22 parallel_executor.cc:642] Cannot enable P2P access from 2 to 4
W0130 06:10:54.920039    22 parallel_executor.cc:642] Cannot enable P2P access from 2 to 5
W0130 06:10:54.920044    22 parallel_executor.cc:642] Cannot enable P2P access from 2 to 6
W0130 06:10:54.920084    22 parallel_executor.cc:642] Cannot enable P2P access from 2 to 7
W0130 06:10:54.920087    22 parallel_executor.cc:642] Cannot enable P2P access from 3 to 0
W0130 06:10:54.920092    22 parallel_executor.cc:642] Cannot enable P2P access from 3 to 1
W0130 06:10:54.920095    22 parallel_executor.cc:642] Cannot enable P2P access from 3 to 2
W0130 06:10:54.920099    22 parallel_executor.cc:642] Cannot enable P2P access from 3 to 4
W0130 06:10:54.920101    22 parallel_executor.cc:642] Cannot enable P2P access from 3 to 5
W0130 06:10:54.920104    22 parallel_executor.cc:642] Cannot enable P2P access from 3 to 6
W0130 06:10:54.920106    22 parallel_executor.cc:642] Cannot enable P2P access from 3 to 7
W0130 06:10:54.920110    22 parallel_executor.cc:642] Cannot enable P2P access from 4 to 0
W0130 06:10:54.920117    22 parallel_executor.cc:642] Cannot enable P2P access from 4 to 1
W0130 06:10:54.920123    22 parallel_executor.cc:642] Cannot enable P2P access from 4 to 2
W0130 06:10:54.920127    22 parallel_executor.cc:642] Cannot enable P2P access from 4 to 3
W0130 06:10:54.920132    22 parallel_executor.cc:642] Cannot enable P2P access from 4 to 5
W0130 06:10:54.920135    22 parallel_executor.cc:642] Cannot enable P2P access from 4 to 6
W0130 06:10:54.920140    22 parallel_executor.cc:642] Cannot enable P2P access from 4 to 7
W0130 06:10:54.920146    22 parallel_executor.cc:642] Cannot enable P2P access from 5 to 0
W0130 06:10:54.920152    22 parallel_executor.cc:642] Cannot enable P2P access from 5 to 1
W0130 06:10:54.920157    22 parallel_executor.cc:642] Cannot enable P2P access from 5 to 2
W0130 06:10:54.920164    22 parallel_executor.cc:642] Cannot enable P2P access from 5 to 3
W0130 06:10:54.920169    22 parallel_executor.cc:642] Cannot enable P2P access from 5 to 4
W0130 06:10:54.920176    22 parallel_executor.cc:642] Cannot enable P2P access from 5 to 6
W0130 06:10:54.920181    22 parallel_executor.cc:642] Cannot enable P2P access from 5 to 7
W0130 06:10:54.920184    22 parallel_executor.cc:642] Cannot enable P2P access from 6 to 0
W0130 06:10:54.920190    22 parallel_executor.cc:642] Cannot enable P2P access from 6 to 1
W0130 06:10:54.920194    22 parallel_executor.cc:642] Cannot enable P2P access from 6 to 2
W0130 06:10:54.920200    22 parallel_executor.cc:642] Cannot enable P2P access from 6 to 3
W0130 06:10:54.920207    22 parallel_executor.cc:642] Cannot enable P2P access from 6 to 4
W0130 06:10:54.920212    22 parallel_executor.cc:642] Cannot enable P2P access from 6 to 5
W0130 06:10:54.920217    22 parallel_executor.cc:642] Cannot enable P2P access from 6 to 7
W0130 06:10:54.920221    22 parallel_executor.cc:642] Cannot enable P2P access from 7 to 0
W0130 06:10:54.920228    22 parallel_executor.cc:642] Cannot enable P2P access from 7 to 1
W0130 06:10:54.920233    22 parallel_executor.cc:642] Cannot enable P2P access from 7 to 2
W0130 06:10:54.920238    22 parallel_executor.cc:642] Cannot enable P2P access from 7 to 3
W0130 06:10:54.920243    22 parallel_executor.cc:642] Cannot enable P2P access from 7 to 4
W0130 06:10:54.920254    22 parallel_executor.cc:642] Cannot enable P2P access from 7 to 5
W0130 06:10:54.920261    22 parallel_executor.cc:642] Cannot enable P2P access from 7 to 6
W0130 06:11:12.578923    22 fuse_all_reduce_op_pass.cc:76] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 2.
PaddlePaddle works well on 8 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

出现了 PaddlePaddle is installed successfully!,说明Paddle完全安装成功,没有问题了。

在用Paddle之路上,找到一个较为方便的Paddle安装方法,分享给大家。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/186230.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

wpa_supplicant EAPOL状态机分析

协议 EEE 802.1X-2004协议:基于端口的网络接入控制协议(port based network access control protocol)。在LAN口对所接入的用户设备进行认证和控制,如果通过认证则端口打开,可以访问局域网中的资源。 状态机设计原理…

TYPE-C接口引脚详解

Type-C口有4对TX/RX分线,2对USBD/D-,一对SBU,2个CC,另外还有4个VBUS和4个地线。 1、当Type-C接口仅用作传输DP信号时,则可利用4对TX/RX,从而实现4Lane传输,这种模式称为DPonly模式;…

如何对项目健康度进行测量?评估项目健康状况

项目驱动变革,大部分公司逐步由运营驱动转变为项目驱动,带来更多重新和商业价值。对组织而言,从商业角度看,项目旨在推动组织从一个状态转到另一个状态,从而达成特定目标。项目的健康情况如何关乎项目和变革的成本&…

生成package.json文件报错“系统找不到指定的路径”

文章目录一、提出问题二、解决问题一、提出问题 package.json文件作为Web工程的入口,到底有多少配置是和我们的日常开发相关的?使用npm或yarn命令生成一个最简单的package.json文件。在命令行执行命令:yarn init -y 执行命令hadoop version也…

IB数学还能做什么IA

苦恼着不知道怎么写数学科IA。对呀,数学不就是做做题,算算数吗,加减乘除神马的,到底还可以做什么课题!? 第一问:“到底怎么才是一个没有bug的数学IA题目呢?” 面对这个问题&#xff…

快速构建和安装干净的 ESXi 8 镜像指南

申请的 ESXi 8 的免费授权到了,所以趁着春节假期最后一天,折腾一把。这篇文档支持 ESXi 8 及以下版本的安装镜像构建,无需麻烦的依赖安装和解决环境问题。 相比较安装运行网上已经构建好的黑盒镜像,为什么不自己进行构建呢&#…

ARM uboot 主Makefile 分析

一、uboot 主Makefile分析1 1、uboot version 确定(Makefile 的 24-29 行) (1) uboot 的版本号分 3 个级别: VERSION:主板本号 PATCHLEVEL:次版本号 SUBLEVEL:再次版本号 EXTRAVERSION : 另外附加的版本信…

Redis对不起是我肤浅了(基础和应用篇):位图(Bitmaps)的妙用和深入分析每个子命令的用法

一、前言 在Redis 4.0 版本之前,Redis是单线程程序,主要是指Redis的网络I/O线程。Redis的持久化、集群同步等操作,则是由另外的线程来执行的。但在Redis 4.0 版本之后,Redis添加了多线程的支持,这时的多线程主要体现在…

温湿度传感器不同输出方式的优异对比

温湿度传感器装有湿敏和热敏元件,多以温湿度一体式的探头作为测温元件,将温度和湿度信号采集出来,经过稳压滤波、运算放大、非线性校正、V/I转换、恒流及反向保护等电路处理后,转换成与温度和湿度成线性关系的电流信号或电压信号输…

1.2.2存储结构:Cache--高速缓存

1.2.2存储结构:Cache--高速缓存Cache--高速缓存(相联存储器)Cache特点Cache改善系统性能局部性原理Cache–高速缓存(相联存储器) CPU中的寄存器和内存对比的话,其容量和速度差距是非常大的,因此…

数据结构 - 学习笔记 - 红黑树

数据结构 - 学习笔记 - 红黑树定义简介知识点1. 结点属性2. 前驱、后继3. 旋转查找插入父结点为黑色父结点为红色1. 有4种情形只需要变色(对应234树4结点)1.1. 变色实现平衡1.2. 递归调整颜色2. 有4种情形需要旋转 变色(对应234树3结点&…

[JavaWeb]CSS

目录1. CSS语法1.1 常用样式-字体颜色1.2 常用样式-边框border1.3 常用样式-字体样式1.4 常用样式-超链接去下划线1.5 常用样式-列表去除修饰2.CSS 使用三种方式2.1 在标签的 style 属性上设置 CSS 样式2.2 在head 标签中,使用style 标签来定义需要的CSS样式2.3 把 CSS 样式写成…

线程的几种状态转换

线程在一定条件下,状态会发生变化。线程一共有以下几种状态: 1、新建状态(New):新创建了一个线程对象。 2、就绪状态(Runnable):线程对象创建后,其他线程调用了该对象的start()方法。该状态的线程位于“可运行线程池…

【头歌】顺序栈的基本操作及应用

第1关:顺序栈的基本操作任务描述本关任务是实现顺序栈的基本操作函数,以实现判断栈是否为满、是否为空、求栈元素个数、进栈和出栈等功能。相关知识栈的基本概念栈是一种特殊的线性表,其特殊性体现在元素插入和删除运算上,它的插入…

Sentry SDK使用(Vue/Browser JS/Laravel)

本文介绍通过Vue/Browser JS/Laravel三个平台对接Sentry SDK。1.在vue中使用这是入门指引,为了了解更多,请查看完整文档。为了收集错误信息和采集性能数据,需要安装以下两个包:sentry/vue(Sentrys Vue SDK)sentry/tracing(instrum…

【网络安全】Wireshark过滤数据包分析TCP三次握手

利用Wireshark分析TCP三次握手和四次挥手一、安装Wireshark二、界面介绍1. 网卡类型2. 首页功能2.1 按钮界面2.2 数据包列表2.3 数据包详细信息列表3. Wireshark过滤器3.1 设置数据抓取选项3.2 显示过滤器3.3 过滤关系3.4 复合过滤表达式3.5 常见用显示过滤需求及其对应表达式3…

IDaaS 如何帮助中小微企业(SMB)赢得市场先机|身份云研究院

数字化的本质是生产关系、生产要素的重构,目的是通过数字化技术释放更多生产力。数据是数字化变革中最重要的生产要素,而开发者则是数字世界中最重要的劳动者。对于企业来说,如何将数据转化成企业重要的生产资料以及如何提升开发者的效率为企…

机器学习算法竞赛实战--2,问题建模

目录 一,赛题理解 1,赛题理解 2,数据理解: 3,评价指标(分类和回归) 思考练习 当参赛者拿到竞赛题目的时候,首先应该考虑的事情就是问题建模,同时完成基线模型的管道…

如何备考2023年高级网络规划设计师?

网络规划设计师是软考高级考试科目之一,也是比较难的科目,据官方数据统计网规每年的通过率很低,而且每年只有下半年11月份考一次,如果是直接裸考,估计很悬哦~ 但是你参加考试获得证书的过程就是一个学习网络规划系统知…

一文揭晓,我是如何在Linux中查找自如

未来已来,只是不均衡地分布在当下 大家好,我是菜农,欢迎来到我的频道。 本文共 2187字,预计阅读 10 分钟 用过 Linux 的小伙伴都知道,在Linux系统中包含着大量的文件,绝大部分情况下,我们都是…