【Transformer 】 Hugging Face手册-推理管道 (04/10)

一、说明

这里是Hugging Face手册第四部分，如何使用推理管道；即使您没有特定模式的经验或不熟悉模型背后的底层代码，您仍然可以使用它们通过 pipeline ()进行推理！

二、推理管道

pipeline ()可以轻松使用Hub中的任何模型来推理任何语言、计算机视觉、语音和多模式任务。即使您没有特定模式的经验或不熟悉模型背后的底层代码，您仍然可以使用它们通过 pipeline ()进行推理！本教程将教您：

使用pipeline()进行推理。
使用特定的分词器或模型。
将pipeline()用于音频、视觉和多模式任务。

2.1 管道使用

虽然每个任务都有一个关联的 pipeline()，但使用包含所有特定于任务的管道的通用 pipeline() 抽象更简单。 pipeline() 自动加载默认模型和能够推理任务的预处理类。我们以使用 pipeline() 进行自动语音识别 (ASR) 或语音转文本为例。

首先创建 pipeline() 并指定推理任务：

from transformers import pipeline
transcriber = pipeline(task="automatic-speech-recognition")

将您的输入传递给 pipeline()。在语音识别的情况下，这是一个音频输入文件：

transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': 'I HAVE A DREAM BUT ONE DAY THIS NATION WILL RISE UP LIVE UP THE TRUE MEANING OF ITS TREES'}

不是你想要的结果吗？查看 Hub 上下载次数最多的一些自动语音识别模型，看看是否可以获得更好的转录。

让我们尝试一下 OpenAI 的 Whisper large-v2 模型。 Whisper 比 Wav2Vec2 晚 2 年发布，并接受了近 10 倍的数据训练。因此，它在大多数下游基准测试中击败了 Wav2Vec2。它还具有预测标点符号和大小写的额外好处，而这两者都是不可能的
Wav2Vec2。

让我们在这里尝试一下，看看它的表现如何：

transcriber = pipeline(model="openai/whisper-large-v2")
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.'}

现在这个结果看起来更准确了！有关 Wav2Vec2 与 Whisper 的深入比较，请参阅音频变压器课程。我们非常鼓励您查看该中心，了解不同语言的模型、您所在领域的专业模型等等。您可以直接从 Hub 上的浏览器查看并比较模型结果，看看它是否比其他模型更适合或处理极端情况。如果您没有找到适合您的用例的模型，您可以随时开始训练自己的模型！

如果您有多个输入，您可以将输入作为列表传递：

transcriber(
    [
        "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac",
        "https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/1.flac",
    ]
)

管道非常适合实验，因为从一种模型切换到另一种模型是微不足道的；然而，有一些方法可以针对比实验更大的工作负载来优化它们。请参阅以下文档的指南，深入了解迭代整个数据集或在网络服务器中使用管道：

Using pipelines on a dataset
Using pipelines for a webserver

2.2 参数

pipeline()支持很多参数；有些是特定于任务的，有些是所有管道通用的。一般来说，您可以在任何地方指定参数：

transcriber = pipeline(model="openai/whisper-large-v2", my_parameter=1)

out = transcriber(...)  # This will use `my_parameter=1`.
out = transcriber(..., my_parameter=2)  # This will override and use `my_parameter=2`.
out = transcriber(...)  # This will go back to using `my_parameter=1`.

我们来看看 3 个重要的：

2.2.1 设备

如果使用 device=n，管道会自动将模型放在指定的设备上。无论您使用的是 PyTorch 还是 Tensorflow，这都有效。

transcriber = pipeline(model="openai/whisper-large-v2", device=0)

如果模型对于单个 GPU 来说太大，并且您使用的是 PyTorch，则可以设置 device_map=“auto” 以自动确定如何加载和存储模型权重。使用 device_map 参数需要 Accelerate 包：

pip install --upgrade accelerate

以下代码自动跨设备加载和存储模型权重：

transcriber = pipeline(model="openai/whisper-large-v2", device_map="auto")

请注意，如果传递了 device_map=“auto”，则在实例化管道时无需添加参数 device=device，因为您可能会遇到一些意外行为！

2.2.2 批量大小

默认情况下，管道不会批量推理，原因请参见此处。原因是批处理不一定更快，在某些情况下实际上可能会相当慢。

但如果它适用于您的用例，您可以使用：

transcriber = pipeline(model="openai/whisper-large-v2", device=0, batch_size=2)
audio_filenames = [f"https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/{i}.flac" for i in range(1, 5)]
texts = transcriber(audio_filenames)

这会在提供的 4 个音频文件上运行管道，但它会将它们以 2 个批次的形式传递给模型（位于 GPU 上，批处理更有可能提供帮助），而不需要您提供任何进一步的代码。输出应该始终与您在没有批处理的情况下收到的输出相匹配。它只是帮助您提高管道速度的一种方法。

管道还可以减轻批处理的一些复杂性，因为对于某些管道，单个项目（如长音频文件）需要分成多个部分才能由模型处理。管道为您执行此块批处理。

2.2.3 任务特定参数

所有任务都提供特定于任务的参数，这些参数提供额外的灵活性和选项来帮助您完成工作。例如， transformers.AutomaticSpeechRecognitionPipeline.call()方法有一个return_timestamps参数，这听起来很有希望为视频添加字幕：

transcriber = pipeline(model="openai/whisper-large-v2", return_timestamps=True)
transcriber("https://huggingface.co/datasets/Narsil/asr_dummy/resolve/main/mlk.flac")
{'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its creed.', 'chunks': [{'timestamp': (0.0, 11.88), 'text': ' I have a dream that one day this nation will rise up and live out the true meaning of its'}, {'timestamp': (11.88, 12.38), 'text': ' creed.'}]}

正如您所看到的，该模型推断了文本，并在各个句子发音时输出。

每个任务都有许多可用参数，因此请查看每个任务的 API 参考，看看您可以修改哪些内容！例如，AutomaticSpeechRecognitionPipeline 有一个 chunk_length_s 参数，该参数有助于处理模型通常无法自行处理的超长音频文件（例如，为整个电影或长达一小时的视频添加字幕）：

transcriber = pipeline(model="openai/whisper-large-v2", chunk_length_s=30, return_timestamps=True)
transcriber("https://huggingface.co/datasets/sanchit-gandhi/librispeech_long/resolve/main/audio.wav")
{'text': " Chapter 16. I might have told you of the beginning of this liaison in a few lines, but I wanted you to see every step by which we came.  I, too, agree to whatever Marguerite wished, Marguerite to be unable to live apart from me. It was the day after the evening...

如果您找不到真正能帮助您的参数，请随时请求！

2.3 在数据集上使用管道

该管道还可以对大型数据集运行推理。我们建议执行此操作的最简单方法是使用迭代器：

def data():
    for i in range(1000):
        yield f"My example {i}"


pipe = pipeline(model="gpt2", device=0)
generated_characters = 0
for out in pipe(data()):
    generated_characters += len(out[0]["generated_text"])

迭代器 data() 生成每个结果，管道自动识别输入是可迭代的，并将开始获取数据，同时继续在 GPU 上处理数据（这在幕后使用 DataLoader）。这很重要，因为您不必为整个数据集分配内存，并且可以尽快为 GPU 提供数据。

由于批处理可以加快速度，因此尝试调整这里的batch_size参数可能会很有用。

迭代数据集的最简单方法是从数据集中加载一个数据集：

# KeyDataset is a util that will just output the item we're interested in.
from transformers.pipelines.pt_utils import KeyDataset
from datasets import load_dataset

pipe = pipeline(model="hf-internal-testing/tiny-random-wav2vec2", device=0)
dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation[:10]")

for out in pipe(KeyDataset(dataset, "audio")):
    print(out)

2.4将管道用于网络服务器

创建推理引擎是一个复杂的主题，值得单独专门讨论一章。

2.4.1 视觉管线

使用 pipeline() 执行视觉任务实际上是相同的。

指定您的任务并将图像传递给分类器。该图像可以是链接、本地路径或 base64 编码的图像。例如，下面显示的是什么品种的猫？

from transformers import pipeline

vision_classifier = pipeline(model="google/vit-base-patch16-224")
preds = vision_classifier(
    images="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/pipeline-cat-chonk.jpeg"
)
preds = [{"score": round(pred["score"], 4), "label": pred["label"]} for pred in preds]
preds

[{‘score’: 0.4335, ‘label’: ‘lynx, catamount’}, {‘score’: 0.0348, ‘label’: ‘cougar, puma, catamount, mountain lion, painter, panther, Felis concolor’}, {‘score’: 0.0324, ‘label’: ‘snow leopard, ounce, Panthera uncia’}, {‘score’: 0.0239, ‘label’: ‘Egyptian cat’}, {‘score’: 0.0229, ‘label’: ‘tiger cat’}]

2.4.2 文本管道

使用 pipeline() 执行 NLP 任务实际上是相同的。

from transformers import pipeline

# This model is a `zero-shot-classification` model.
# It will classify text, except you are free to choose any label you might imagine
classifier = pipeline(model="facebook/bart-large-mnli")
classifier(
    "I have a problem with my iphone that needs to be resolved asap!!",
    candidate_labels=["urgent", "not urgent", "phone", "tablet", "computer"],
)

三、多式联运管道

pipeline ()支持不止一种模式。例如，视觉问答（VQA）任务结合了文本和图像。请随意使用您喜欢的任何图像链接以及您想询问的有关该图像的问题。该图像可以是图像的 URL 或本地路径。

例如，如果您使用此发票图像：
在这里插入图片描述

from transformers import pipeline

vqa = pipeline(model="impira/layoutlm-document-qa")
vqa(
    image="https://huggingface.co/spaces/impira/docquery/resolve/2359223c1837a7587402bda0f2643382a6eefeab/invoice.png",
    question="What is the invoice number?",
)

[{ ‘分数’ : 0.42515 , ‘答案’ : ‘us-001’ , ‘开始’ : 16 , ‘结束’ : 16 }]

pytesseract要运行上面的示例，除了 Transformers 之外，您还需要安装：

sudo apt install -y tesseract-ocr
pip install pytesseract

四、在具有加速功能的大型模型上使用管道：

pipeline您可以使用 🤗轻松在大型模型上运行accelerate！首先确保您已经安装accelerate了pip install accelerate.

首先使用加载您的模型device_map=“auto”！我们将facebook/opt-1.3b在我们的示例中使用。

# pip install accelerate
import torch
from transformers import pipeline

pipe = pipeline(model="facebook/opt-1.3b", torch_dtype=torch.bfloat16, device_map="auto")
output = pipe("This is a cool example!", do_sample=True, top_p=0.95)

如果您安装bitsandbytes并添加参数load_in_8bit=True，您还可以传递8位加载模型

# pip install accelerate bitsandbytes
import torch
from transformers import pipeline

pipe = pipeline(model="facebook/opt-1.3b", device_map="auto", model_kwargs={"load_in_8bit": True})
output = pipe("This is a cool example!", do_sample=True, top_p=0.95)