Basics

Sequences

Sequences (or “seqs” for short) are the primary way to interact with models on Mixlayer. Create a sequence using the open operation:

import { ModelSocket } from "modelsocket";
 
// open a socket
const socket = new ModelSocket("wss://models.mixlayer.ai/ws",{
  apiKey: process.env.MIXLAYER_API_KEY,
});
 
const seq = await socket.open("meta/llama3.1-8b-instruct-free");
 
// do things with seq...

Once you’ve opened a sequence, you can append and generate tokens using the append and gen operations.

When you’re done with a sequence, be sure to close it using the close operation to stop being billed for compute resources.

await seq.close();

Roles

Most models are instruction-tuned and require you to specify a role when appending or generating text:

// append a system prompt
await seq.append("You are a buddhist monk that deeply ponders existential questions.", 
    { role: "system" });
 
// append a question as a user
await seq.append("What's the meaning of life?", 
    { role: "user" });
 
// generate a response as the assistant role
const response = await seq.gen({ role: "assistant" }).text(); 
 
console.log(response);

Subsequent append or gen calls will reuse the last specified role. For example, after seq.gen({ role: "assistant" }), the next call to seq.append("...") will implicitly use the assistant role.

Appending Tokens

Use the append operation to add text to the model’s context window:

await seq.append("The meaning of life is ");
 
const response = await seq.gen().text(); // generate a completion
 
console.log(response);

Prefilling

If you’d like to guide a model’s response, you can provide a partial completion before asking it to generate:

// append a prompt
await seq.append(
  `What are the events leading up to the dot com crash?
 
Respond in with an array of JSON objects, with each object having 2 fields: 
* event (string) 
* date (string) 
 
`,
  { role: "user" }
);
 
// prefill part of the model's response with a code fence
// so it starts generating JSON immediately
await seq.append(
  "```json\n",
  { role: "assistant" } // important: assistant is role
);
 
// generate the object, stop when the code fence
// is terminated
const jsonArray = await seq.gen({ role: "assistant", stopAt: "```" }).text();
 
console.log(JSON.parse(jsonArray));

Generating Tokens

Use the gen operation to generate new tokens from the model:

await seq.append("What are 3 fun things to do in San Francisco?");
 
// generate a response
const response = await seq.gen().text();
console.log(response);

Stopping

You can instruct the model to stop generating tokens by using the stopAt or limit options.

Token count

Stop generation after a certain number of tokens by using the limit option:

await seq.append("What's the meaning of life?");
 
// generate a response
const response = await seq.gen({ limit: 10 }).text();
console.log(response);

Stop text

Stop generation when a fragment of text is generated by using the stopAt option:

await seq.append("What's the meaning of life?");
 
// generate a response
const response = await seq.gen({ stopAt: "\n\n" }).text();
console.log(response);

Sampling

Mixlayer supports a variety of sampling methods, including:

  • temperature: sample from the distribution with a temperature parameter
  • topP: nucleus sampling, select the most likely tokens up to a cumulative probability threshold
  • topK: limit sampling to the top K highest probability tokens
  • seed: set a fixed random seed for reproducibility
  • repeatPenalty: penalize the model for repeating text

Different models require different sampling parameters for optimal results, check the model’s documentation for more information.

await seq.append("What's the meaning of life?");
 
// generate a response
const response = await seq.gen({ 
  topP: 0.9, // nucleus sampling
  seed: 42, // reproducibility
  temperature: 0.7, // sample from the distribution with a temperature parameter
  topK: 10, // limit sampling to the top K highest probability tokens
  repeatPenalty: 1.2, // penalize the model for repeating text
}).text();
console.log(response);

If no sampling method is specified, the model will use greedy sampling by default (selecting the highest probability token).

Thinking

If you’re using a reasoning model, you can instruct the model to think about its response by using the thinking option:

// open a seq on a reasoning model
const seq = await socket.open("qwen/qwen3-8b");
await seq.append("What's the meaning of life?\n\n", { role: "user" });
 
// capture thinking text
const thinkingText = await seq
  .gen({ thinking: true, stopAt: "</think>", role: "assistant" })
  .text();
 
console.log("thoughts: \n", thinkingText);
 
// generate a response after thinking a bit
const response = await seq.gen({ role: "assistant" }).text();
console.log("\n---\nresponse: \n", response);

This will automatically prefill the model’s response with a <think> tag. For hybrid models, you can set thinking: false to ask the model to skip its thinking phase.

Streaming

Mixlayer text genearation is natively streaming, so you can process or forward the model’s response as it’s being generated. Instead of using the text method, you can use the textStream method to get a stream of text chunks:

const seq = await socket.open("meta/llama3.1-8b-instruct-free");
 
await seq.append("You are a helpful assistant that talks like a pirate.", 
   { role: "system" });
 
await seq.append("What's so dangerous about Drake's passage?", 
   { role: "user" });
 
// generate a response using the textStream method
const stream = await seq.gen({ role: "assistant" }).textStream();
 
for await (const chunk of stream) {
  process.stdout.write(chunk);
}