ollama v0.11.10 发布：全面支持 EmbeddingGemma 嵌入模型，性能优化与稳定性大幅提升

福大大架构师每日一题

2242人浏览 · 2025-09-07 08:40:47

福大大架构师每日一题 · 2025-09-07 08:40:47 发布

在这里插入图片描述

2025年9月4日，Ollama 团队正式发布了 v0.11.10 版本，该版本带来了一个重要的新模型支持——EmbeddingGemma，这是一个在其规模类别中表现最佳的开源嵌入模型。除此之外，本次更新还包含多项性能优化、错误修复和稳定性提升，进一步巩固了 Ollama 作为本地大模型推理首选工具的地位。

本文将深入解析 v0.11.10 版本的主要更新内容，包括 EmbeddingGemma 模型的技术特点、代码层面的改动、性能优化细节以及如何在实际项目中应用这些新功能。

一、EmbeddingGemma：新一代开源嵌入模型

EmbeddingGemma 是本次更新的重中之重。它是一个专为文本嵌入任务设计的轻量级模型，在多项基准测试中均表现出色，尤其是在语义相似度计算、检索增强生成（RAG）和文本分类等任务中表现优异。

1.1 模型特点

最佳性能：在同规模模型中，EmbeddingGemma 在多项嵌入任务中达到最优表现。
开源免费：完全开源，支持商用和研究使用。
轻量高效：模型结构优化，推理速度快，资源占用低。
多语言支持：支持中英文等多种语言的嵌入表示。

1.2 使用场景

语义搜索
文档检索
文本聚类与分类
问答系统与 RAG 应用

二、代码更新详解

2.1 模型架构扩展

Ollama v0.11.10 在 model/model.go 中新增了对嵌入模型的识别机制：

arch := b.Config().Architecture()
if b.Config().Uint("pooling_type", math.MaxUint32) != math.MaxUint32 {
    arch = arch + "_embed"
}

这一改动使得 Ollama 能够自动识别并加载带有嵌入池化层的模型，为 EmbeddingGemma 的接入提供了基础设施支持。

2.2 新增 EmbeddingGemma 模型实现

在 model/models/gemma3/embed.go 中，新增了 embedModel 结构体，实现了嵌入模型的前向传播逻辑：

type embedModel struct {
    model.Base
    model.SentencePieceModel
    *TextModel
    PoolingType uint32
    Dense [2]*nn.Linear `gguf:"dense"`
}

func (m *embedModel) Forward(ctx ml.Context, batch input.Batch) (ml.Tensor, error) {
    // ... 实现嵌入池化和线性变换
}

支持两种池化方式：

0：None（无池化）
1：Mean（平均池化）

2.3 词汇表处理优化

在 model/vocabulary.go 中，优化了特殊令牌（BOS/EOS）的处理逻辑，避免重复添加，提升嵌入生成的准确性。

2.4 缓存机制增强

在 runner/ollamarunner/cache.go 中，新增了 cachePrompt 参数，允许在嵌入任务中跳过提示词缓存，提升推理效率：

func (c *InputCache) LoadCacheSlot(prompt []*input.Input, cachePrompt bool) (*InputCacheSlot, []*input.Input, error) {
    if !cachePrompt {
        numPast = 0
    }
    // ...
}

三、性能优化与稳定性提升

3.1 异步计算支持

在 runner/ollamarunner/runner.go 中，引入了异步计算机制，支持嵌入模型的非阻塞推理：

supportsAsync := s.model.Backend().Config().Uint("pooling_type", math.MaxUint32) == math.MaxUint32
if supportsAsync {
    go s.computeBatch(activeBatch)
} else {
    s.computeBatch(activeBatch)
}

3.2 日志系统优化

多处代码中增加了 logutil.Trace 日志输出，便于调试和性能分析，例如：

logutil.Trace("encoded", "string", s, "ids", ids)

3.3 错误处理与兼容性

修复了 AMD GPU 识别错误导致的崩溃问题。
增强了 Mac 和 Linux 平台下的错误处理机制，减少因环境配置问题导致的运行中断。

四、如何使用 EmbeddingGemma

4.1 安装与更新

确保 Ollama 版本为 v0.11.10 或更高：

ollama pull embedding-gemma
ollama run embedding-gemma

4.2 调用嵌入接口

使用 HTTP API 获取文本嵌入：

curl -X POST http://localhost:11434/api/embedding \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Your text here"
  }'

4.3 集成到 Python 项目

import requests

def get_embedding(text):
    response = requests.post(
        "http://localhost:11434/api/embedding",
        json={"content": text}
    )
    return response.json()["embedding"]