本地运行LLama 3.2的三种方法

news2025/3/11 6:23:07

在这里插入图片描述
大型语言模型（LLMs）已经彻底改变了AI领域，小型模型也在崛起。因此，即使是在旧的PC和智能手机上运行先进的LLMs也成为了可能。为了给大家一个起点，我们将探索三种不同的方法来本地与LLama 3.2进行交互。

先决条件

在这里插入图片描述

在我们深入探讨之前，请确保你已经：

安装并运行了Ollama
已经拉取了LLama 3.2模型（在终端中使用 ollama pull llama3.2）

现在，让我们来探索这三种方法！

Ollama的Python包提供了一种简便的方法，可以在你的Python脚本或Jupyter笔记本中与LLama 3.2进行交互。

import ollama


response = ollama.chat(
    model="llama3.2",
    messages=[
        {
            "role": "user",
            "content": "Tell me an interesting fact about elephants",
        },
    ],
)
print(response["message"]["content"])

这种方法非常适合简单的同步交互。但如果你想要流式接收响应呢？Ollama为你提供了AsyncClient：

import asyncio
from ollama import AsyncClient


async def chat():
    message = {
        "role": "user",
        "content": "Tell me an interesting fact about elephants"
    }
    async for part in await AsyncClient().chat(
        model="llama3.2", messages=[message], stream=True
    ):
        print(part["message"]["content"], end="", flush=True)


# Run the async function
asyncio.run(chat())

方法二：使用Ollama API

对于那些更喜欢直接使用API或想要将LLama 3.2集成到非Python应用程序中的人，Ollama提供了一个简单的HTTP API。

curl http://localhost:11434/api/chat -d '{
    "model": "llama3.2",
    "messages": [
        {
            "role": "user",
            "content": "What are God Particles?"
        }
    ],
    "stream": false
}'

这种方法为你提供了从任何能够发出HTTP请求的语言或工具与LLama 3.2进行交互的灵活性。

方法三：使用Langchain构建高级应用程序

对于更复杂的应用程序，特别是涉及文档分析和检索的应用程序，Langchain与Ollama和LLama 3.2可以无缝集成。

以下代码片段展示了加载文档、创建嵌入和执行相似性搜索的过程：

from langchain_community.document_loaders import DirectoryLoader, UnstructuredWordDocumentLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.llms import Ollama
from langchain_community.vectorstores import Chroma


# 加载文档
loader = DirectoryLoader('/path/to/documents', glob="**/*.docx", loader_cls=UnstructuredWordDocumentLoader)
documents = loader.load()


# 将文档拆分为多个块
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
splits = text_splitter.split_documents(documents)


# 创建嵌入和向量存储
embeddings = OllamaEmbeddings(model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)


# 初始化LLama 3.2
llm = Ollama(model="llama3.2", base_url="http://localhost:11434")


# 执行相似性搜索并生成回答
query = "What was the main accomplishment of Thomas Jefferson?"
similar_docs = vectorstore.similarity_search(query)
context = "\n".join([doc.page_content for doc in similar_docs])
response = llm(f"上下文: {context}\n问题: {query}\n回答:")
print(response)

这种方法允许你构建能够使用LLama 3.2强大的语言理解能力来理解和推理大量文本数据的应用程序。

开始生成提示了！

下载成功后，你可以在终端中使用以下命令运行模型：

$ ollama run llama3.2:3b-instruct-q8_0

现在你已经可以生成一些图片提示了。假设我想为我的博客文章创建一个缩略图，于是我运行了下面的提示词：

Generate a random image prompt that I can use as a thumbnail for my article.

这是生成的结果：

Here's a random image prompt for you:


"A misty, moonlit forest with an old, gnarled tree in the center, its branches twisted and tangled like a giant's fingers. In the background, a subtle glow
emanates from a faint, ethereal light that seems to be seeping from the very earth itself."


Feel free to use this prompt as is or modify it to fit your article's theme and style!

在这里插入图片描述