How to stream chat model responses
All chat
models
implement the Runnable
interface,
which comes with a default implementations of standard runnable
methods (i.e.Β invoke, batch, stream, streamEvents).
The default streaming implementation provides an AsyncGenerator
that yields a single value: the final output from the underlying chat
model provider.
The default implementation does not provide support for token-by-token streaming, but it ensures that the the model can be swapped in for any other model as it supports the same standard interface.
The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.
See which integrations support token-by-token streaming here.
Streamingβ
Below, we use a --- to help visualize the delimiter between tokens.
Pick your chat model:
- OpenAI
- Anthropic
- FireworksAI
- MistralAI
- Groq
- VertexAI
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/openai 
yarn add @langchain/openai 
pnpm add @langchain/openai 
Add environment variables
OPENAI_API_KEY=your-api-key
Instantiate the model
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
  model: "gpt-4o-mini",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/anthropic 
yarn add @langchain/anthropic 
pnpm add @langchain/anthropic 
Add environment variables
ANTHROPIC_API_KEY=your-api-key
Instantiate the model
import { ChatAnthropic } from "@langchain/anthropic";
const model = new ChatAnthropic({
  model: "claude-3-5-sonnet-20240620",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/community 
yarn add @langchain/community 
pnpm add @langchain/community 
Add environment variables
FIREWORKS_API_KEY=your-api-key
Instantiate the model
import { ChatFireworks } from "@langchain/community/chat_models/fireworks";
const model = new ChatFireworks({
  model: "accounts/fireworks/models/llama-v3p1-70b-instruct",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/mistralai 
yarn add @langchain/mistralai 
pnpm add @langchain/mistralai 
Add environment variables
MISTRAL_API_KEY=your-api-key
Instantiate the model
import { ChatMistralAI } from "@langchain/mistralai";
const model = new ChatMistralAI({
  model: "mistral-large-latest",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/groq 
yarn add @langchain/groq 
pnpm add @langchain/groq 
Add environment variables
GROQ_API_KEY=your-api-key
Instantiate the model
import { ChatGroq } from "@langchain/groq";
const model = new ChatGroq({
  model: "mixtral-8x7b-32768",
  temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/google-vertexai 
yarn add @langchain/google-vertexai 
pnpm add @langchain/google-vertexai 
Add environment variables
GOOGLE_APPLICATION_CREDENTIALS=credentials.json
Instantiate the model
import { ChatVertexAI } from "@langchain/google-vertexai";
const model = new ChatVertexAI({
  model: "gemini-1.5-flash",
  temperature: 0
});
const stream = await model.stream(
  "Write me a 1 verse song about goldfish on the moon"
);
for await (const chunk of stream) {
  console.log(`${chunk.content}
---`);
}
---
Sw
---
imming
---
 in
---
 a
---
 world
---
 of
---
 silver
---
 beams
---
,
---
Gold
---
fish
---
 on
---
 the
---
 moon
---
,
---
 living
---
 their
---
 dreams
---
.
---
---
---
Stream eventsβ
Chat models also support the standard streamEvents() method.
This method is useful if youβre streaming output from a larger LLM application that contains multiple steps (e.g., a chain composed of a prompt, chat model and parser).
let idx = 0;
const stream = model.streamEvents(
  "Write me a 1 verse song about goldfish on the moon",
  {
    version: "v2",
  }
);
for await (const event of stream) {
  idx += 1;
  if (idx === 5) {
    console.log("...Truncated");
    break;
  }
  console.log(event);
}
{
  event: 'on_chat_model_start',
  data: { input: 'Write me a 1 verse song about goldfish on the moon' },
  name: 'ChatOpenAI',
  tags: [],
  run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
  metadata: {
    ls_provider: 'openai',
    ls_model_name: 'gpt-3.5-turbo',
    ls_model_type: 'chat',
    ls_temperature: 1,
    ls_max_tokens: undefined,
    ls_stop: undefined
  }
}
{
  event: 'on_chat_model_stream',
  data: {
    chunk: AIMessageChunk {
      lc_serializable: true,
      lc_kwargs: [Object],
      lc_namespace: [Array],
      content: '',
      name: undefined,
      additional_kwargs: {},
      response_metadata: [Object],
      id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
      tool_calls: [],
      invalid_tool_calls: [],
      tool_call_chunks: [],
      usage_metadata: undefined
    }
  },
  run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
  name: 'ChatOpenAI',
  tags: [],
  metadata: {
    ls_provider: 'openai',
    ls_model_name: 'gpt-3.5-turbo',
    ls_model_type: 'chat',
    ls_temperature: 1,
    ls_max_tokens: undefined,
    ls_stop: undefined
  }
}
{
  event: 'on_chat_model_stream',
  run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
  name: 'ChatOpenAI',
  tags: [],
  metadata: {
    ls_provider: 'openai',
    ls_model_name: 'gpt-3.5-turbo',
    ls_model_type: 'chat',
    ls_temperature: 1,
    ls_max_tokens: undefined,
    ls_stop: undefined
  },
  data: {
    chunk: AIMessageChunk {
      lc_serializable: true,
      lc_kwargs: [Object],
      lc_namespace: [Array],
      content: '',
      name: undefined,
      additional_kwargs: {},
      response_metadata: [Object],
      id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
      tool_calls: [],
      invalid_tool_calls: [],
      tool_call_chunks: [],
      usage_metadata: undefined
    }
  }
}
{
  event: 'on_chat_model_stream',
  data: {
    chunk: AIMessageChunk {
      lc_serializable: true,
      lc_kwargs: [Object],
      lc_namespace: [Array],
      content: 'Sw',
      name: undefined,
      additional_kwargs: {},
      response_metadata: [Object],
      id: 'chatcmpl-9lOQhe44ip2q0DHfr0eYU9TF4mHtu',
      tool_calls: [],
      invalid_tool_calls: [],
      tool_call_chunks: [],
      usage_metadata: undefined
    }
  },
  run_id: 'c9966059-70eb-4f24-9de3-2cf04320c8f6',
  name: 'ChatOpenAI',
  tags: [],
  metadata: {
    ls_provider: 'openai',
    ls_model_name: 'gpt-3.5-turbo',
    ls_model_type: 'chat',
    ls_temperature: 1,
    ls_max_tokens: undefined,
    ls_stop: undefined
  }
}
...Truncated
Next stepsβ
Youβve now seen a few ways you can stream chat model responses.
Next, check out this guide for more on streaming with other LangChain modules.