2.LangChain--聊天模型结构化返回

传统的LLM调用返回的信息一般为字符串的文本形式，当我们想要让LLM仅返回我们想看到的信息我们就要自己设置结构化格式给LLM，让其返回我们所需的内容。在 LangChain 中，聊天模型提供了额外的功能：结构化输出。⼀种使聊天模型以结构化格式（例如 JSON）进⾏响应的技术。例如，可能希望将模型输出存储在数据库中，并确保输出符合数据库模式。这种需求激发了结构化输出的概念，其中可以指⽰模型使⽤特定的

lhxcc_fly

327人浏览 · 2026-05-26 20:24:03

lhxcc_fly · 2026-05-26 20:24:03 发布

1.引言

传统的LLM调用返回的信息一般为字符串的文本形式，当我们想要让LLM仅返回我们想看到的信息我们就要自己设置结构化格式给LLM，让其返回我们所需的内容。

import os
from langchain_openai import ChatOpenAI
model = ChatOpenAI(
    model="deepseek-v4-flash",
    api_key=os.getenv("DEEPSEEK_API_KEY"),
    base_url=os.getenv("DEEPSEEK_BASE_URL"),
)
print(model.invoke("给我写一首简单的现代诗歌。").content)

在 LangChain 中，聊天模型提供了额外的功能：结构化输出。⼀种使聊天模型以结构化格式（例如 JSON）进⾏响应的技术。例如，可能希望将模型输出存储在数据库中，并确保输出符合数据库模式。这种需求激发了结构化输出的概念，其中可以指⽰模型使⽤特定的输出结构进⾏响应。

2.with_structured_output()

想使⽤结构化输出能⼒，LangChain 提供了⼀种⽅法 .with_structured_output() ，该⽅法

需要先定义输出结构，然后执⾏通过 .with_structured_output() 得到的 Runnable 实例

# 1. 定义输出结构
schema = {"foo": "bar"}
# 2. 绑定schema，其实是⽣成⽀持结构化返回的 Runnable 实例
model_with_structure = model.with_structured_output(schema)
# 3. 执⾏
structured_output = model_with_structure.invoke(user_input)

此⽅法将输出结构作为参数输⼊，返回⼀个类似 model 的 Runnable。不同之处在于执⾏ Runnable 后的输出结果,输出与给定输出结构相对应的对象。

该输出结构可以指定为 TypedDict 类、JSON Schema 或 Pydantic 类。

注意：国产模型即使是deepseek这种兼容OpenAI的模型也不支持Json_Schema结构化强制返回，所以要使用OpenAI旗下模型gpt

博主近期账户受限无法使用gpt所以代码演示没有结果见谅

3.Runnable对象分类

3.1 Pydantic

我们可以设置执⾏ Runnable 后的输出结果指定为 Pydantic 类，这将返回⼀个 Pydantic 对象。

当收到模型的响应后，LangChain 会提取出代表 Pydantic 参数的 JSON 对象，并⽤ Pydantic 模型对其进⾏解析和验证，将这个验证后的 JSON 转换为⼀个可⽤的 Pydantic 对象实例返回

import os
from typing import Optional

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

model = ChatOpenAI(model="gpt-5.5",)

class Poetry(BaseModel):
    theme:str=Field(description="诗歌主题。")
    punchline:str=Field(description="诗歌的诗眼与创新点。")
    rating:Optional[int]=Field(default=None,description="评分1~10，给自己的诗歌评估一个分数")


result = model.with_structured_output(Poetry)
print(result.invoke("给我写一首简单优雅的现代诗歌。"))
# print(model.invoke("给我写一首简单的现代诗歌。").content)

除了单一Pydantic对象返回，LangChain还支持链表多个Pydantic对象嵌套返回

import os
from typing import Optional,List
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

model = ChatOpenAI(model="gpt-5.5",)

class Poetry(BaseModel):
    theme:str=Field(description="诗歌主题。")
    punchline:str=Field(description="诗歌的诗眼与创新点。")
    rating:Optional[int]=Field(default=None,description="评分1~10，给自己的诗歌评估一个分数")
    
class Data(BaseModel):
    jokes: List[Poetry]

result = model.with_structured_output(Data)
print(result.invoke("分别给我写一首简单优雅的现代诗歌和古代诗歌。"))
# print(model.invoke("给我写一首简单的现代诗歌。").content)

3.2TypedDict（类型字典）

⽤于为字典对象返回提供精确的、结构化的类型提⽰。它允许我们指定⼀个字典中应该有哪些键，以及每个键对应的值的类型

字典形式的一个重要作用是可以检查代码的字段名和类型的错误

# TypeDict对象
class Poetry(TypedDict):
    theme:Annotated[str,...,"诗歌主题。"]
    punchline:Annotated[str,..., "诗歌的诗眼与创新点。"]
    rating:Annotated[Optional[int],None, "评分1~10，给自己的诗歌评估一个分数。"]

result2 = model.with_structured_output(Poetry,include_raw=True) #include_raw=True 保留LLM返回的信息缓冲区

3.3JSON

JSON串是最为常用的结构化返回形式，也是LangChain中常用的结构化返回规则。我们要声明JSON首先要定义Json Schema.

# json_schema对象
json_schema = {
    "title": "joke",
    "description": "给⽤⼾写一首诗歌。",
    "type": "object",
    "properties": {
        "theme": {
            "type": "string",
            "description": "这个诗歌的主题。",
        },
        "punchline": {
            "type": "string",
            "description": "这个诗歌的诗眼与创新点。",
        },
        "rating": {
            "type": "integer",
            "description": "评分1~10，给自己的诗歌评估一个分数。",
            "default": None,
        },
    },
    "required": ["setup", "punchline"], #必填字段
}
result3 = model.with_structured_output(json_schema)

3.4选择返回模式

对于LangChain的结构化返回，当我们定义好对象类型后可以让与定义相关的话题按结构化返回，当我们又询问LLM与定义不相关的问题时再采用结构化返回就有问题了。所以我们可以通过多个封装函数，将不同定义封装进Union中进行选择性结构化返回。

# 选择性结构化返回
model = ChatOpenAI(model = "gpt-5.5")
class Poetry(BaseModel):
    theme:str=Field(description="诗歌主题。")
    punchline:str=Field(description="诗歌的诗眼与创新点。")
    rating:Optional[int]=Field(default=None,description="评分1~10，给自己的诗歌评估一个分数。")

class Conversion(BaseModel):
    response:str=Field(description="回答用户问题以默认的对话模式")

class And(BaseModel):
    final_output:Union[Poetry,Conversion]

result4 = model.with_structured_output(And)
result5 = model.with_structured_output(And)

print(result4.invoke("写一首简单优雅的现代诗歌。"))
print(result5.invoke("你是谁？"))

4.应用场景

4.1信息提取

按照我们想要获取的内容信息来对一段文本进行信息的提取

# 1.信息提取：
from langchain_openai import ChatOpenAI
from typing import Optional
from pydantic import BaseModel, Field
from langchain_core.messages import HumanMessage, SystemMessage
# 定义⼤模型
model = ChatOpenAI(model="gpt-4o-mini")
class Person(BaseModel):
    """⼀个⼈的信息。"""
    # 注意:
    # 1. 每个字段都是 Optional “可选的” —— 允许 LLM 在不知道答案时输出 None。
    # 2. 每个字段都有⼀个 description “描述” —— LLM使⽤这个描述。
    # 有⼀个好的描述可以帮助提⾼提取结果。
    name: Optional[str] = Field(default=None, description="这个⼈的名字")
    hair_color: Optional[str] = Field(default=None, description="如果知道这个⼈头发的颜⾊")
    skin_color: Optional[str] = Field(default=None, description="如果知道这个⼈的肤⾊")
    height_in_meters: Optional[str] = Field(default=None, description="以⽶为单位的⾼度")
    
structured_model = model.with_structured_output(schema=Person)
messages = [
        SystemMessage(content="你是⼀个提取信息的专家，只从⽂本中提取相关信息。如果您不知道要提取的属性的值，属性值返回null"),
        HumanMessage(content="史密斯⾝⾼6英尺，⾦发。")
]
result = structured_model.invoke(messages)
print(result)

4.2少量文本案例提示

给出文本提示让大模型仿照进行输出

4.3与绑定好的工具一起使用

对已经绑定好工具的Runnable对象进行结构化返回

先绑定工具再进行结构化返回

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
# 绑定工具的结构化返回
# 定义⼤模型
model = ChatOpenAI(model="gpt-4o-mini")
# 结构输出对象
class SearchResult(BaseModel):
    """结构化搜索结果。"""
    query: str = Field(description="搜索查询")
    findings: str = Field(description="调查结果摘要")
@tool
def web_search(query: str) -> str:
    """在⽹上搜索信息。
    Args:
    query: 搜索查询
    """
    return "西安今天多云转⼩⾬，⽓温18-23度，东南⻛2级，空⽓质量良好"
# ⼿动将⼯具结果加⼊消息列表
model_with_search = model.bind_tools([web_search])
messages = [
    HumanMessage("搜索当前最新的西安的天⽓")
]
ai_msg = model_with_search.invoke(messages)
messages.append(ai_msg)
for tool_call in ai_msg.tool_calls:
    tool_msg = web_search.invoke(tool_call)
    messages.append(tool_msg)
structured_search_model =  model_with_search.with_structured_output(SearchResult)
result = structured_search_model.invoke(messages)
print(result)