Sequences
Sequences (or “seqs” for short) are the primary way to interact with models on Mixlayer. Create a sequence using the open
operation:
import { ModelSocket } from "modelsocket";
// open a socket
const socket = new ModelSocket("wss://models.mixlayer.ai/ws",{
apiKey: process.env.MIXLAYER_API_KEY,
});
const seq = await socket.open("meta/llama3.1-8b-instruct-free");
// do things with seq...
Once you’ve opened a sequence, you can append and generate tokens using the append
and gen
operations.
When you’re done with a sequence, be sure to close it using the close
operation to stop being billed for compute resources.
await seq.close();
Roles
Most models are instruction-tuned and require you to specify a role when appending or generating text:
// append a system prompt
await seq.append("You are a buddhist monk that deeply ponders existential questions.",
{ role: "system" });
// append a question as a user
await seq.append("What's the meaning of life?",
{ role: "user" });
// generate a response as the assistant role
const response = await seq.gen({ role: "assistant" }).text();
console.log(response);
Subsequent append
or gen
calls will reuse the last specified role. For example, after seq.gen({ role: "assistant" })
, the next call to seq.append("...")
will implicitly use the assistant
role.
Appending Tokens
Use the append
operation to add text to the model’s context window:
await seq.append("The meaning of life is ");
const response = await seq.gen().text(); // generate a completion
console.log(response);
Prefilling
If you’d like to guide a model’s response, you can provide a partial completion before asking it to generate:
// append a prompt
await seq.append(
`What are the events leading up to the dot com crash?
Respond in with an array of JSON objects, with each object having 2 fields:
* event (string)
* date (string)
`,
{ role: "user" }
);
// prefill part of the model's response with a code fence
// so it starts generating JSON immediately
await seq.append(
"```json\n",
{ role: "assistant" } // important: assistant is role
);
// generate the object, stop when the code fence
// is terminated
const jsonArray = await seq.gen({ role: "assistant", stopAt: "```" }).text();
console.log(JSON.parse(jsonArray));
Generating Tokens
Use the gen
operation to generate new tokens from the model:
await seq.append("What are 3 fun things to do in San Francisco?");
// generate a response
const response = await seq.gen().text();
console.log(response);
Stopping
You can instruct the model to stop generating tokens by using the stopAt
or limit
options.
Token count
Stop generation after a certain number of tokens by using the limit
option:
await seq.append("What's the meaning of life?");
// generate a response
const response = await seq.gen({ limit: 10 }).text();
console.log(response);
Stop text
Stop generation when a fragment of text is generated by using the stopAt
option:
await seq.append("What's the meaning of life?");
// generate a response
const response = await seq.gen({ stopAt: "\n\n" }).text();
console.log(response);
Sampling
Mixlayer supports a variety of sampling methods, including:
temperature
: sample from the distribution with a temperature parametertopP
: nucleus sampling, select the most likely tokens up to a cumulative probability thresholdtopK
: limit sampling to the top K highest probability tokensseed
: set a fixed random seed for reproducibilityrepeatPenalty
: penalize the model for repeating text
Different models require different sampling parameters for optimal results, check the model’s documentation for more information.
await seq.append("What's the meaning of life?");
// generate a response
const response = await seq.gen({
topP: 0.9, // nucleus sampling
seed: 42, // reproducibility
temperature: 0.7, // sample from the distribution with a temperature parameter
topK: 10, // limit sampling to the top K highest probability tokens
repeatPenalty: 1.2, // penalize the model for repeating text
}).text();
console.log(response);
If no sampling method is specified, the model will use greedy sampling by default (selecting the highest probability token).
Thinking
If you’re using a reasoning model, you can instruct the model to think about its response by using the thinking
option:
// open a seq on a reasoning model
const seq = await socket.open("qwen/qwen3-8b");
await seq.append("What's the meaning of life?\n\n", { role: "user" });
// capture thinking text
const thinkingText = await seq
.gen({ thinking: true, stopAt: "</think>", role: "assistant" })
.text();
console.log("thoughts: \n", thinkingText);
// generate a response after thinking a bit
const response = await seq.gen({ role: "assistant" }).text();
console.log("\n---\nresponse: \n", response);
This will automatically prefill the model’s response with a <think>
tag. For hybrid models, you can set thinking: false
to ask the model to skip its thinking phase.
Streaming
Mixlayer text genearation is natively streaming, so you can process or forward the model’s response as it’s being generated. Instead of using the text
method, you can use the textStream
method to get a stream of text chunks:
const seq = await socket.open("meta/llama3.1-8b-instruct-free");
await seq.append("You are a helpful assistant that talks like a pirate.",
{ role: "system" });
await seq.append("What's so dangerous about Drake's passage?",
{ role: "user" });
// generate a response using the textStream method
const stream = await seq.gen({ role: "assistant" }).textStream();
for await (const chunk of stream) {
process.stdout.write(chunk);
}