Sequences
Sequences (or “seqs” for short) are the primary way to interact with models on Mixlayer. Create a sequence using the open
operation:
import { ModelSocket } from "modelsocket";
// open a socket
const socket = new ModelSocket("wss://models.mixlayer.ai/ws",{
apiKey: process.env.MIXLAYER_API_KEY,
});
const seq = await socket.open("meta/llama3.1-8b-instruct-free");
// do things with seq...
Once you’ve opened a sequence, you can append and generate tokens using the append
and gen
operations.
When you’re done with a sequence, be sure to close it using the close
operation to stop being billed for compute resources.
await seq.close();
Roles
Most models are instruction-tuned and require you to specify a role when appending or generating text:
// append a system prompt
await seq.append("You are a buddhist monk that deeply ponders existential questions.",
{ role: "system" });
// append a question as a user
await seq.append("What's the meaning of life?",
{ role: "user" });
// generate a response as the assistant role
const response = await seq.gen({ role: "assistant" }).text();
console.log(response);
Subsequent append
or gen
calls will reuse the last specified role. For example, after seq.gen({ role: "assistant" })
, the next call to seq.append("...")
will implicitly use the assistant
role.
Appending Tokens
Use the append
operation to add text to the model’s context window:
await seq.append("The meaning of life is ");
const response = await seq.gen().text(); // generate a completion
console.log(response);
Prefilling
If you’d like to guide a model’s response, you can provide a partial completion before asking it to generate:
// append a prompt
await seq.append(
`What are the events leading up to the dot com crash?
Respond in with an array of JSON objects, with each object having 2 fields:
* event (string)
* date (string)
`,
{ role: "user" }
);
// prefill part of the model's response with a code fence
// so it starts generating JSON immediately
await seq.append(
"```json\n",
{ role: "assistant" } // important: assistant is role
);
// generate the object, stop when the code fence
// is terminated
const jsonArray = await seq.gen({ role: "assistant", stopAt: "```" }).text();
console.log(JSON.parse(jsonArray));
Generating Tokens
Use the gen
operation to generate new tokens from the model:
await seq.append("What are 3 fun things to do in San Francisco?");
// generate a response
const response = await seq.gen().text();
console.log(response);
Stopping
You can instruct the model to stop generating tokens by using the stopAt
or limit
options.
Token count
Stop generation after a certain number of tokens by using the limit
option:
await seq.append("What's the meaning of life?");
// generate a response
const response = await seq.gen({ limit: 10 }).text();
console.log(response);
Stop text
Stop generation when a fragment of text is generated by using the stopAt
option:
await seq.append("What's the meaning of life?");
// generate a response
const response = await seq.gen({ stopAt: "\n\n" }).text();
console.log(response);
Sampling
Mixlayer supports a variety of sampling methods, including:
temperature
: sample from the distribution with a temperature parametertopP
: nucleus sampling, select the most likely tokens up to a cumulative probability thresholdtopK
: limit sampling to the top K highest probability tokensseed
: set a fixed random seed for reproducibilityrepeatPenalty
: penalize the model for repeating text
Different models require different sampling parameters for optimal results, check the model’s documentation for more information.
await seq.append("What's the meaning of life?");
// generate a response
const response = await seq.gen({
topP: 0.9, // nucleus sampling
seed: 42, // reproducibility
temperature: 0.7, // sample from the distribution with a temperature parameter
topK: 10, // limit sampling to the top K highest probability tokens
repeatPenalty: 1.2, // penalize the model for repeating text
}).text();
console.log(response);
If no sampling method is specified, the model will use greedy sampling by default (selecting the highest probability token).
Thinking
If you’re using a reasoning model, you can instruct the model to think about its response by using the thinking
option:
// open a seq on a reasoning model
const seq = await socket.open("qwen/qwen3-8b");
await seq.append("What's the meaning of life?\n\n", { role: "user" });
// capture thinking text
const thinkingText = await seq
.gen({ thinking: true, stopAt: "</think>", role: "assistant" })
.text();
console.log("thoughts: \n", thinkingText);
// generate a response after thinking a bit
const response = await seq.gen({ role: "assistant" }).text();
console.log("\n---\nresponse: \n", response);
This will automatically prefill the model’s response with a <think>
tag. For hybrid models, you can set thinking: false
to ask the model to skip its thinking phase.
Streaming
Mixlayer text genearation is natively streaming, so you can process or forward the model’s response as it’s being generated. Instead of using the text
method, you can use the textStream
method to get a stream of text chunks:
const seq = await socket.open("meta/llama3.1-8b-instruct-free");
await seq.append("You are a helpful assistant that talks like a pirate.",
{ role: "system" });
await seq.append("What's so dangerous about Drake's passage?",
{ role: "user" });
// generate a response using the textStream method
const stream = await seq.gen({ role: "assistant" }).textStream();
for await (const chunk of stream) {
process.stdout.write(chunk);
}
Forking
Mixlayer allows you to “fork” sequences, which means you can create independent child sequences from the context in a single parent sequence.
Building up context in a sequence can be expensive. Forking allows you to reuse that state and explore multiple different future paths from that common context.
Basics
In this example, we’ll populate a sequence with some instructions then create several forks of it. In each fork we’ll give it further instructions and then concurrently generate responses from each child sequence.
It’s important to close forks when you’re done with them. The withFork
method closes the child automatically when the closure exits.
const seq = await socket.open("meta/llama3.1-8b-instruct-free");
// populate parent sequence with a question
await seq.append("What's the meaning of life?", { role: "user" });
// in each child, ask the model to answer in a different language
const [spanish, chinese, pirate] = await Promise.all([
seq.withFork((s) => {
s.append("Please answer in Spanish.\n", { role: "user" });
return s.gen({ role: "assistant" }).text();
}),
seq.withFork((s) => {
s.append("Please answer in Chinese.\n", { role: "user" });
return s.gen({ role: "assistant" }).text();
}),
seq.withFork((s) => {
s.append("Please answer like a pirate.\n", { role: "user" });
return s.gen({ role: "assistant" }).text();
}),
]);
console.log("Spanish: ", spanish);
console.log("Chinese: ", chinese);
console.log("Pirate: ", pirate);
Limits
Limits on the number of concurrent forks you have vary baed on your account’s limits. If you exceed this limit, the platform will throw an error.
Using Tools
Tools allow you to extend the capabilities of models by providing them with functions they can choose to call.
LLMs only have knowledge of the world up to the date of their training cutoff, so they can’t know things like the current weather. Tools allow you to provide them with the ability to look up information or perform actions in the real world.
Give a model access to a tool by passing it to the open
operation:
// define a tool
export const WEATHER_TOOL = {
name: "get_current_weather",
description: "Retrieves the current weather for a city",
fn: async ({ location }: { location: string }) => {
console.log("*** weather tool called with location: ", location);
if (location.toLowerCase().includes("san francisco")) {
return { temperature: 60, units: "F" };
}
return { error: "I don't know the weather in that city" };
},
parameters: {
location: {
param_type: "string",
description:
'City and State and Country to retrieve the weather for. ',
required: true,
},
},
};
// give the seq access to the tool using the tools option
const seq = await socket.open("meta/llama3.1-8b-instruct-free",
{ tools: [WEATHER_TOOL] });
await seq.append("What's the weather in San Francisco?",
{ role: "user" });
const response = await seq.gen({ role: "assistant" }).text();
console.log(response);
Tool specification
Tools are defined as objects with the following required properties:
-
name
(string) A unique identifier for the tool. This should be descriptive and follow naming conventions (e.g., snake_case or camelCase). -
description
(string) A clear description of what the tool does. This helps the model understand when and how to use the tool. -
fn
(function) The actual function that will be executed when the model calls the tool. This should be an async function that:- Takes a single parameter object with the tool’s input parameters
- Returns a result that the model can understand
- Handles errors gracefully
-
parameters
(object) Defines the input parameters the tool accepts. Each parameter should specify:param_type
: The data type (e.g., “string”, “number”, “boolean”)description
: What the parameter representsrequired
: Whether the parameter is mandatory (boolean)
Example
const myTool = {
name: "tool_name",
description: "What this tool does",
fn: async (params: { param1: string, param2?: number }) => {
// Tool implementation
return { result: "some value" };
},
parameters: {
param1: {
param_type: "string",
description: "Description of param1",
required: true,
},
param2: {
param_type: "number",
description: "Description of param2",
required: false,
},
},
};
The model will automatically parse your tool specification and call the appropriate function with the correct parameters when it determines the tool should be used.