1.创建虚拟环境
conda create -n faster-whisper python=3.10
conda activate faster-whisper
2.安装cpu版本的pytorch
pip3 install torch torchvision torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple
3.验证pytorch安装结果
(faster-whisper) H:\big-model\faster-whisper-large-v3>python
Python 3.10.16 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:19:12) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import torch
>>> import torchvision
>>> import torchaudio
>>>
>>> print(f"PyTorch version: {torch.__version__}")
PyTorch version: 2.6.0+cpu
>>> print(f"torchvision version: {torchvision.__version__}")
torchvision version: 0.21.0+cpu
>>> print(f"torchaudio version: {torchaudio.__version__}")
torchaudio version: 2.6.0+cpu
>>> print(f"NumPy version: {torch.__version__}")
NumPy version: 2.6.0+cpu
>>>
4.安装ctranslate2和faster-whisper
pip3 install ctranslate2 faster-whisper -i https://pypi.tuna.tsinghua.edu.cn/simple
5.下载faster-whisper-large-v3模型
执行下面的python语句会去外网Hugging Face Hub自动下载,要翻墙,下载慢。
>>> model = WhisperModel("large-v3")
可以手动下载放到H:\big-model\faster-whisper-large-v3目录下
下载地址:https://huggingface.co/Systran/faster-whisper-large-v3/tree/main
6.测试语音转文字
>>> from faster_whisper import WhisperModel
>>> model_path = "H:\\big-model\\faster-whisper-large-v3"
>>> model = WhisperModel(model_path, device="cpu")
[2025-02-12 21:39:43.689] [ctranslate2] [thread 2996] [warning] The compute type inferred from the saved model is float16, but the target device or backend do not support efficient float16 computation. The model weights have been automatically converted to use the float32 compute type instead.
>>>
>>>
>>> audio_file = "H:\\big-model\\audio\\628941565166328648.mp3"
>>> segments, info = model.transcribe(audio_file, beam_size=5)
>>> for segment in segments:
... print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
...
[0.00s -> 2.70s] 下面我们来看一下理财的三要素
[2.70s -> 6.38s] 安全性、流动性和收益性
[6.38s -> 11.94s] 世界上任何的投资行为都是在这三性中综合考量
done