使用Stable-Diffusion生成视频的完整教程

news2024/9/22 7:25:55

本文是关于如何使用cuda和Stable-Diffusion生成视频的完整指南,将使用cuda来加速视频生成,并且可以使用Kaggle的TESLA GPU来免费执行我们的模型。

 #install the diffuser package
 #pip install --upgrade pip
 !pipinstall--upgradediffuserstransformersscipy
 
 #load the model from stable-diffusion model card
 importtorch
 fromdiffusersimportStableDiffusionPipeline
 
 fromhuggingface_hubimportnotebook_login

模型加载

模型的权重是是在CreateML OpenRail-M许可下发布的。这是一个开放的许可证,不要求对生成的输出有任何权利,并禁止我们故意生产非法或有害的内容。如果你对这个许可有疑问,可以看这里

https://huggingface.co/CompVis/stable-diffusion-v1-4

我们首先要成为huggingface Hub的注册用户,并使用访问令牌才能使代码工作。我们使用是notebook,所以需要使用notebook_login()来进行登录的工作

执行完代码下面的单元格将显示一个登录界面,需要粘贴访问令牌。

 ifnot (Path.home()/'.huggingface'/'token').exists(): notebook_login()

然后就是加载模型

 model_id="CompVis/stable-diffusion-v1-4"
 device="cuda"
 pipe=StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 pipe=pipe.to(device)

显示根据文本生成图像

 %%time
 #Provide the Keywords 
 prompts= [
     "a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece by artgerm by wlop by alphonse muhca ",
     "detailed portrait beautiful Neon Operator Girl, cyberpunk futuristic neon, reflective puffy coat, decorated with traditional Japanese ornaments by Ismail inceoglu dragan bibin hans thoma greg rutkowski Alexandros Pyromallis Nekro Rene Maritte Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face",
     "symmetry!! portrait of minotaur, sci - fi, glowing lights!! intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",
     "Human, Simon Stalenhag in forest clearing style, trends on artstation, artstation HD, artstation, unreal engine, 4k, 8k",
     "portrait of a young ruggedly handsome but joyful pirate, male, masculine, upper body, red hair, long hair, d & d, fantasy, roguish smirk, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha ",
     "Symmetry!! portrait of a sith lord, warrior in sci-fi armour, tech wear, muscular!! sci-fi, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha",
     "highly detailed portrait of a cat knight wearing heavy armor, stephen bliss, unreal engine, greg rutkowski, loish, rhads, beeple, makoto shinkai and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, tom whalen, alphonse mucha, global illumination, god rays, detailed and intricate environment ",
     "black and white portrait photo, the most beautiful girl in the world, earth, year 2447, cdx"
 ]

显示

 %%time
 #show the results
 images=pipe(prompts).images
 images
 
 #show a single result
 images[0]

第一个文本:a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece 的图像如下

将生成的图像显示在一起

 #show the results in grid
 fromPILimportImage
 defimage_grid(imgs, rows, cols):
     w,h=imgs[0].size
     grid=Image.new('RGB', size=(cols*w, rows*h))
     fori, imginenumerate(imgs): grid.paste(img, box=(i%cols*w, i//cols*h))
     returngrid
 
 grid=image_grid(images, rows=2, cols=4)
 grid
 
 #Save the results
 grid.save("result_images.png")

如果你的GPU内存有限(可用的GPU RAM小于4GB),请确保以float16精度加载StableDiffusionPipeline,而不是如上所述的默认float32精度。这可以通过告诉扩散器期望权重为float16精度来实现:

 %%time
 importtorch
 pipe=StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 pipe=pipe.to(device)
 pipe.enable_attention_slicing()
 
 images2=pipe(prompts)
 images2[0]
 
 grid2=image_grid(images, rows=2, cols=4)
 grid2

如果要更换噪声调度器,也需要将它传递给from_pretrained:

 %%time
 fromdiffusersimportStableDiffusionPipeline, EulerDiscreteScheduler
 
 model_id="CompVis/stable-diffusion-v1-4"
 # Use the Euler scheduler here instead
 scheduler=EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
 pipe=StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
 pipe=pipe.to("cuda")
 images3=pipe(prompts)
 images3[0][0]
 
 #save the final output
 grid3.save("results_stable_diffusionv1.4.png")

看看这图就是更换不同调度器的结果

 #results are saved in tuple
 images3[0][0]
 
 grid3=image_grid(images3[0], rows=2, cols=4)
 grid3
 
 #save the final output
 grid3.save("results_stable_diffusionv1.4.png")

查看全部图片

创建视频。

基本的操作已经完成了,现在我们来使用Kaggle生成视频

首先进入notebook设置:在加速器选择GPU,

然后安装所需的软件包

 pipinstall-Ustable_diffusion_videos
 
 fromhuggingface_hubimportnotebook_login
 notebook_login()
 #Making Videos
 fromstable_diffusion_videosimportStableDiffusionWalkPipeline
 importtorch
 #"CompVis/stable-diffusion-v1-4" for 1.4
 
 pipeline=StableDiffusionWalkPipeline.from_pretrained(
     "runwayml/stable-diffusion-v1-5",
     torch_dtype=torch.float16,
     revision="fp16",
 ).to("cuda")
 #Generate the video Prompts 1
 video_path=pipeline.walk(
     prompts=['environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
             'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
             'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
             'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
             'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000'],
     seeds=[42,333,444,555],
     num_interpolation_steps=50,
     #height=1280,  # use multiples of 64 if > 512. Multiples of 8 if < 512.
     #width=720,   # use multiples of 64 if > 512. Multiples of 8 if < 512.
     output_dir='dreams',        # Where images/videos will be saved
     name='imagine',        # Subdirectory of output_dir where images/videos will be saved
     guidance_scale=8.5,         # Higher adheres to prompt more, lower lets model take the wheel
     num_inference_steps=50,     # Number of diffusion steps per image generated. 50 is good default
    
 )
 
 

将图像扩大到4k,这样可以生成视频

 fromstable_diffusion_videosimportRealESRGANModel
 model=RealESRGANModel.from_pretrained('nateraw/real-esrgan')
 model.upsample_imagefolder('/kaggle/working/dreams/imagine/imagine_000000/', '/kaggle/working/dreams/imagine4K_00')

为视频添加音乐

为视频增加音乐可以通过提供音频文件的将音频添加到视频中。

 %%capture
 !pipinstallyoutube-dl
 !youtube-dl-fbestaudio--extract-audio--audio-formatmp3--audio-quality0-o"music/thoughts.%(ext)s"https://soundcloud.com/nateraw/thoughts
 
 fromIPython.displayimportAudio
 
 Audio(filename='music/thoughts.mp3')

这里我们使用youtube-dl下载音频(需要注意该音频的版权),然后将音频加入到视频中

 # Seconds in the song.
 audio_offsets= [7, 9]
 fps=8
 
 # Convert seconds to frames
 num_interpolation_steps= [(b-a) *fpsfora, binzip(audio_offsets, audio_offsets[1:])]
 
 
 video_path=pipeline.walk(
     prompts=['blueberry spaghetti', 'strawberry spaghetti'],
     seeds=[42, 1337],
     num_interpolation_steps=num_interpolation_steps,
     height=512,                            # use multiples of 64
     width=512,                             # use multiples of 64
     audio_filepath='music/thoughts.mp3',    # Use your own file
     audio_start_sec=audio_offsets[0],       # Start second of the provided audio
     fps=fps,                               # important to set yourself based on the num_interpolation_steps you defined
     batch_size=4,                          # increase until you go out of memory.
     output_dir='dreams',                 # Where images will be saved
     name=None,                             # Subdir of output dir. will be timestamp by default
 )

本文代码你可以在这里找到:

https://avoid.overfit.cn/post/781a2bd8a4534f7cb2d223c141d37df8

作者:Bob Rupak Roy

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/177257.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

JUC并发编程(1.Java线程)

博客指南&#x1f4a1; JUC并发编程博客将持续更新&#xff0c;内容将参考黑马程序员深入学习Java并发编程以及相关阅读的书籍&#xff0c;内容包括进程&#xff0c;线程&#xff0c;并发和并行。 学习的路上永远不是一个人&#xff0c;相信努力会有所收获&#xff01; 希望我的…

【自步课程学习】 Paced-Curriculum Learning

引入: Confidence-Aware Paced-Curriculum Learning by Label Smoothing for Surgical Scene Understanding 【Code:https://github.com/XuMengyaAmy/P-CBLS】 先说

JavaEE初阶第二课:文件操作与IO

欢迎来到javaee初阶的第二课&#xff0c;这节课我会带大家了解文件的概念和java中如何操作文件。 这里写目录标题1.了解文件1.1文件概念1.2文件存储1.3文件路径2.Java中的文件操作&#xff08;文件系统操作&#xff09;2.1File类2.1.1方法实践3.Java的文件操作&#xff08;文件…

字节青训前端笔记 | Webpack入门

本节课将重点围绕「 Webpack 」这一核心话题展开。简述前端工程化的常用工具webpack 的原理和使用 webpack的作用 webpack的作用是把很多文件打包整合到一起, 缩小项目体积, 提高加载速度&#xff0c;常用的场景是&#xff1a; 代码压缩 将JS、CSS代码混淆压缩&#xff0c;…

25. 迭代器和生成器的详解

1. 迭代器 (1) 迭代是Python最强大的功能之一&#xff0c;是访问集合元素的一种方式。 (2) 迭代器是一个可以记住遍历的位置的对象。 (3) 迭代器对象从集合的第一个元素开始访问&#xff0c;直到所有的元素被访问完结束。迭代器只能往前不会后退。 (4) 迭代器有两个基本的方法…

Tkinter的Canvas控件

Canvas控件是Tkinter界面设计的一个画图工具&#xff0c;也可以用它导入外部图案到界面中 创建画布 import tkinter as tk roottk.Tk() #创建界面 root.title(Canvas) #界面命名 root.geometry(500x300) #设置界面大小 canvastk.Canvas(root) …

03_class创建device创建_kobject_uevent发送

总结 根据之前的kobject知道 /sys/目录下的每个文件夹都是一个 kobject的对象 使用class_create() 创建 /sys/class/xxx目录 同时返回class对象 使用device_create() 创建/sys/class/xxx/yyy目录 和创建/dev/yyy的文件节点 同时返回device对象 class和device 都间接继承于kobj…

【老卫搞机】135期:华为开发者联盟社区2022年牛人之星奖品开箱!

首先祝大家兔年大吉&#xff0c;身体安康&#xff0c;钱兔似锦&#xff01;今天咱们来开箱一件特殊的奖品&#xff0c;来自华为开发者联盟社区的新年祝福——2022年牛人之星。 华为有钱&#xff01;惯例用的是顺丰快递&#xff0c;各位看一下这里面是有很多件的 有这两件。第一…

三、TCP/IP---ARP和ICMP协议

ARP协议 简介&#xff1a;号称TCP/IP中最不安全的协议&#xff0c;安全工具&#xff0c;黑客工具大多数基于ARP协议。它是地址解析协议&#xff0c;用于实现从IP到MAC地址的映射&#xff0c;即询问目标Ip对应的MAC地址是多少&#xff0c;局域网通信不仅需要源目地址封装&#…

学习率衰减、局部最优、Batch归一化、Softmax回归

目录1.学习率衰减(Learning rate decay)在训练初期&#xff0c;梯度下降的步伐大一点&#xff0c;开始收敛的时候&#xff0c;小一些的学习率能让步伐小一些。1 epoch 遍历一遍训练集学习率衰减公式&#xff1a;例&#xff1a;假设衰减率decayrate 1&#xff0c;0.2epochNumα…

蓝桥杯-刷题-补基础

十道入门题 题目来源,题目,简单解析,代码,输入输出 目录 前言 一,汉诺塔 二,判断闰年 三,大写变小写 四&#xff0c;破译密码 五&#xff0c;反向数相加 六&#xff0c;Excel表中的列号 七&#xff0c;饮料兑换 八&#xff0c;角谷猜想 九&#xff0c;数字统计…

小喵2022年的年度总结,啊滴妈呀,开了眼了。

宝子&#xff0c;你不点个赞吗&#xff1f;不评个论吗&#xff1f;不收个藏吗&#xff1f; 最后的最后&#xff0c;关注我&#xff0c;关注我&#xff0c;关注我&#xff0c;你会看到更多有趣的博客哦&#xff01;&#xff01;&#xff01; 喵喵喵&#xff0c;你对我真的很重…

Qt扫盲- QUdpSocket 类理论总结

QUdpSocket 类理论总结一、概述二、使用流程三、QNetworkDatagram 简述一、概述 UDP (User Datagram Protocol)是一种轻量级的、不可靠的、面向数据报的、无连接的协议。当可靠性不重要时&#xff0c;可以使用它。QUdpSocket是QAbstractSocket的子类&#xff0c;允许发送和接收…

SpringBoot+Vue--token,vue导航守卫,axios拦截器-笔记3

自己学习记录,写的不详细,没有误导,不想误导 大概的登录逻辑,前后端完整实现: 1.用户名,密码验证成功后,后端签发token返回给前端 2.前端把token保存到本地存储 3.每次请求前,通过axios请求拦截器,统一发送token 4.通过Vue导航守卫,和axios响应拦截器,统一保护页面 新建个…

【华为上机真题 2023】事件推送

&#x1f388; 作者&#xff1a;Linux猿 &#x1f388; 简介&#xff1a;CSDN博客专家&#x1f3c6;&#xff0c;华为云享专家&#x1f3c6;&#xff0c;Linux、C/C、云计算、物联网、面试、刷题、算法尽管咨询我&#xff0c;关注我&#xff0c;有问题私聊&#xff01; &…

机制设计原理与应用(四)预算可行的拍卖机制

文章目录4 预算可行的拍卖机制4.1 特征4.2 使用案例4.3 拍卖设计问题4.4 单调次模函数&#xff08;Monotone Submodular Function&#xff09;4.4.1 分配算法4.4.2 关键支付计划4.4.3 特性4.5 在线预算可行的拍卖机制4.5.1 Secretary Problem(A Optimal Stopping Problem)4.5.2…

如何与他人交流-第5期

上期我们讲了打破预期,顺应主体这期我们来讲讲如何建立亲和关系(关系侧)我的别人交流,只有在不把别人当成对象(工具人),而是把对方当成主体的情况下(让别人感受到尊重),这是相互尊重的终极本质,也是唯一方法.把别人当人看.认同对方,对方也会认同你.自信从何而来自信本意为相信自…

信息论复习—连续信源、信道及容量

目录 连续信源的熵&#xff1a; 连续信源离散化后的概率空间&#xff1a; 连续信源离散化后的熵&#xff1a; 连续信源的绝对熵&#xff1a; 连续信源的相对熵&#xff1a; 连续信源的条件熵&#xff1a; 连续信源的相对条件熵&#xff1a; 连续信源相对熵的最大化&#…

李宏毅ML-卷积神经网络CNN

李宏毅ML-卷积神经网络CNN 文章目录李宏毅ML-卷积神经网络CNNImage ClassificationConvolutional Layer1. Neural Version StoryReceptive FieldParameter Sharing2. Filter Version StoryFilterParameter Sharing3. Summary of Two VersionsPooling LayerThe Whole CNNDrawbac…

Hyperbolic geometry (双曲几何简介)

ContentsManifolds: A Gentle IntroductionManifoldsTangent SpacesMetric TensorRiemannian Manifolds (黎曼流形)Hyperbolic Geometry and Poincar EmbeddingsCurvature (曲率)Euclidean and Non-Euclidean GeometriesHyperbolic SpaceMinkowski SpaceHyperboloid (双曲面)Th…