supabase链接vecs文档

news2026/3/15 2:27:44

使用Supabase链接本地数据库

Vecs 如何使用本地数据库工作。确保机器上安装了Supabase CLI。

# Initialize your project
supabase init

# Start Postgres
supabase start

Supabase vecs同步数据

vecs官方文档

创建集合

import vecs #下面这一行是本地的postgre数据库连接 
#vx = vecs.create_client("postgresql://postgres:postgres@localhost:54322/postgres") DB_CONNECTION = "postgresql://<user>:<password>@<host>:<port>/<db_name>" 
vx = vecs.create_client(DB_CONNECTION) docs = vx.get_or_create_collection(name="docs", dimension=3)

添加向量

现在我们可以使用以下命令将一些嵌入向量插入到我们的“文档”集合中upsert()：

import vecs # create vector store client 
docs = vecs.get_or_create_collection(name="docs", dimension=3) 
# a collection of vectors with 3 dimensions 
vectors=[ ("vec0", [0.1, 0.2, 0.3], {"year": 1973}), ("vec1", [0.7, 0.8, 0.9], {"year": 2012}) ] 
# insert our vectors docs.upsert(vectors=vectors)

查询集合

你现在可以查询集合来检索相关匹配：

import vecs 
docs = vecs.get_or_create_collection(name="docs", dimension=3) 
# query the collection filtering metadata for "year" = 2012 
docs.query(
    data=[0.4,0.5,0.6], 
    # required limit=1, 
    # number of records to return 
    filters={"year": {"$eq": 2012}}, 
    # metadata filters )

补充一个重要的空查询，用来查所有数据：

query_vector = [0] * 768 # 全部为0的查询向量 
# 查询数据 
data = docs.query( 
    data=query_vector, 
    limit=1000,#最大limit为1000 
    filters={}, 
    measure="cosine_distance",
    include_value=True,#返回向量值 
    include_metadata=True#返回元数据 )

补充根据id查询：

源代码：

使用（metadata位于[0][2]）：


character_data = cards_collection.fetch([str(request.character_id)]) 
if not character_data: 
    raise HTTPException(status_code=404, detail="Character not found") 
metadata = character_data[0][2]

返回数据格式示例：

删除向量

删除记录会将其从集合中移除。要删除记录，请为该方法指定列表ids或元数据过滤器delete。该方法将返回成功删除的记录的 ID。请注意，尝试删除不存在的记录不会引发错误。

docs.delete(ids=["vec0", "vec1"]) # or delete by a metadata filter docs.delete(filters={"year": {"$eq": 2012}})

适配器

适配器是一项可选功能，用于在向集合添加数据或从集合中查询数据之前对其进行转换。适配器让你能够仅使用项目的原生数据类型（例如，仅使用原始文本）与集合进行交互，而无需手动处理向量。

适配器在更新和查询时将你的输入转换为新格式。例如，你可以将大文本拆分为较小的块，或将其转换为嵌入。当然，适配器对 Hugging Face 模型具有一流的支持。

有关可用适配器的完整列表，请参阅内置适配器。

创建一个带有适配器的集合，该适配器将文本分块为段落，并使用模型将每个块转换为嵌入向量all-MiniLM-L6-v2。

首先，安装vecs文本嵌入的可选依赖项：

pip install "vecs[text_embedding]"

然后创建一个带有适配器的集合，将文本分块为段落，并使用all-MiniLM-L6-v2384 维文本嵌入模型嵌入每个段落。

import vecs from vecs.adapter 
import Adapter, ParagraphChunker, TextEmbedding 
# create vector store client 
vx = vecs.Client("postgresql://<user>:<password>@<host>:<port>/<db_name>")
# create a collection with an adapter 
docs = vx.get_or_create_collection( name="docs", 
    adapter=Adapter( [ ParagraphChunker(skip_during_query=True),   
    TextEmbedding(model='all-MiniLM-L6-v2'), ] ) )

通过在集合中注册适配器，我们可以将记录插入到集合中，传递文本而不是向量。

# add records to the collection using text as the media type 
docs.upsert( records=[ 
    ( "vec0", "four score and ....", # <- note that we can now pass text here 
    {"year": 1973} ), 
    ( "vec1", "hello, world!", 
    {"year": "2012"} ) ] )

类似地，我们可以使用文本查询集合。

# search by text 
docs.query(data="foo bar")

Supabase可视化管理：

在table editor选择vecs向量库

向量库视图查看

向量搜索原理

余弦相似度

在NLP的任务里，会对生成两个词向量进行相似度的计算，常常采用余弦相似度公式计算。

余弦相似度用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小。余弦值越接近1，就表明夹角越接近0度，也就是两个向量越相似，这就叫"余弦相似性"。

余弦相似性的值范围在-1到1之间：

当余弦相似性等于1时，表示两个向量在多维空间中的方向完全相同，即它们是完全相似的。
当余弦相似性等于0时，表示两个向量之间不存在线性关系，它们是不相关的。
当余弦相似性等于-1时，表示两个向量在多维空间中的方向正好相反，即它们是完全不同的。

def cos_sim(vector_a, vector_b): """ 计算两个向量之间的余弦相似度 :param vector_a: 向量 a :param vector_b: 向量 b :return: sim """ 
vector_a = np.mat(vector_a) 
vector_b = np.mat(vector_b) num = float(vector_a * vector_b.T) 
denom = np.linalg.norm(vector_a) * np.linalg.norm(vector_b) 
sim = num / denom return sim