Chat Completions

Chat Completions

POST https://models.mixlayer.ai/v1/chat/completions

Generate a chat completion from one of the available models. Mixlayer’s API is OpenAI-compatible, so existing OpenAI client libraries work without modification — see Client Libraries for examples in Python, TypeScript, Rust, and curl.

This page is the canonical reference for every supported request parameter.

Authentication

Every request requires a Bearer token in the Authorization header. Create one in the Mixlayer console.

Authorization: Bearer $MIXLAYER_API_KEY

Minimal request

curl https://models.mixlayer.ai/v1/chat/completions \
  -H "Authorization: Bearer $MIXLAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.5-4b-free",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Messages

The messages field is a list of message objects in turn order. Each message has a role and content.

RoleDescription
systemHigh-level instructions or persona for the assistant. Typically the first message.
userA user turn.
assistantA previous assistant turn. Include these to continue a multi-turn conversation. May also carry tool_calls from a previous turn.
toolThe result of a tool call. Must include tool_call_id referencing the assistant’s tool_calls[].id. See Tool Calling.

Assistant messages from previous turns may also include reasoning_content from a thinking-mode response — see Reasoning.

Sampling parameters

Sampling parameters control how the model selects each next token.

ParameterTypeRangeDescription
temperaturefloatControls randomness. Lower values (~0.2) make output more deterministic; higher values (~1.0+) make it more diverse.
top_pfloat0.0–1.0Nucleus sampling: only consider tokens whose cumulative probability is below top_p.
top_kintOnly consider the top-k most likely tokens at each step.

Sampling defaults vary by model. See the per-model notes for recommended settings — for example, Qwen 3.5 has different recommended values for thinking mode vs. instruct mode.

Penalty parameters

Penalty parameters discourage the model from repeating tokens.

ParameterTypeRangeDescription
frequency_penaltyfloat-2.0 to 2.0Penalizes tokens proportional to how often they’ve already appeared. Positive values reduce repetition.
presence_penaltyfloat-2.0 to 2.0Penalizes tokens that have appeared at all, regardless of count. Positive values encourage topical novelty.
repetition_penaltyfloat0.0 to 2.0Multiplicative penalty on tokens already in the context. 1.0 is no penalty; values above 1.0 penalize repetition; values below 1.0 encourage it.

Output control

ParameterTypeDescription
max_completion_tokensintMaximum tokens to generate. Takes precedence over max_tokens if both are set.
max_tokensintLegacy alias for max_completion_tokens.
stoparray of stringsSequences that will halt generation if produced. The stop sequence itself is not included in the output.
seedintBest-effort deterministic sampling. Same seed + same parameters + same input should produce the same output.
streamboolIf true, stream the response as Server-Sent Events. Defaults to false.

Response format

Use response_format to constrain the model’s output structure.

{ "response_format": { "type": "text" } }
typeBehaviorRequires json_schema
textNo constraint. Default.No
json_objectOutput must be a syntactically valid JSON object.No
json_schemaOutput must conform to the supplied JSON Schema.Yes

Example of a constrained JSON schema response:

{
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "schema": {
        "type": "object",
        "properties": {
          "city": { "type": "string" },
          "country": { "type": "string" }
        },
        "required": ["city", "country"]
      },
      "strict": true
    }
  }
}
⚠️

response_format: json_schema cannot be combined with thinking mode (thinking: true or reasoning_effort). The gateway returns an invalid_response_format error if both are set.

Tool calling

Pass an array of tools the model can invoke. See the Tool Calling guide for the full request/response loop.

{
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": { "city": { "type": "string" } },
          "required": ["city"]
        }
      }
    }
  ]
}

Reasoning

Pass thinking: true (or reasoning_effort: "low" | "medium" | "high") to enable extended chain-of-thought reasoning on supported models. Reasoning is returned in a separate reasoning_content field on the assistant message — see the Reasoning guide.

Streaming

When stream: true, the response is a stream of chat.completion.chunk events delivered as Server-Sent Events. Each event has the same envelope as a non-streaming response, but choices[].message is replaced with choices[].delta, which contains only the new content for that chunk.

data: {"id":"...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Chihuahuas "},"finish_reason":null}]}

data: {"id":"...","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"are tiny."},"finish_reason":"stop"}]}

data: [DONE]

The delta object contains whichever fields are new in that chunk:

  • role — present only on the first chunk
  • content — incremental visible text
  • reasoning_content — incremental reasoning text (thinking mode only)
  • tool_calls — incremental tool call data (when the model invokes a tool)

finish_reason is null until the final chunk. Possible terminal values:

finish_reasonMeaning
stopModel finished naturally or hit a stop sequence.
lengthHit max_completion_tokens / max_tokens.
tool_callsModel invoked a tool.

A streaming example in your language of choice:

curl https://models.mixlayer.ai/v1/chat/completions \
  -H "Authorization: Bearer $MIXLAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.5-4b-free",
    "stream": true,
    "messages": [{"role": "user", "content": "Count to five."}]
  }'

Response shape

A non-streaming response:

{
  "id": "chatcmpl-abc123...",
  "object": "chat.completion",
  "created": 1735689600,
  "model": "qwen/qwen3.5-4b-free",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Chihuahuas have the largest brain-to-body ratio of any dog breed."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 14,
    "completion_tokens": 17,
    "total_tokens": 31
  }
}

When the model invokes a tool, message.content will be empty and message.tool_calls will contain the call. When thinking mode is enabled, message.reasoning_content will hold the model’s chain of thought.

OpenAI compatibility notes

The following standard OpenAI parameters are not currently supported by Mixlayer. Requests that include them are accepted and silently ignored:

  • n — multiple completions per request
  • min_p — minimum probability sampling
  • tool_choice — forcing a specific tool selection (the model decides automatically based on tools)
  • logprobs, top_logprobs
  • user
  • logit_bias

If your application depends on any of these, open an issue or reach out on Discord.

Errors

Errors follow OpenAI’s error envelope:

{
  "error": {
    "message": "Model not found.",
    "type": "model_not_found",
    "code": "model_not_found"
  }
}
HTTP statustypeCommon causes
400invalid_request_errorMalformed JSON, parameter out of range (e.g. presence_penalty outside -2.0 to 2.0), invalid tool schema, invalid response_format.
400model_not_foundThe requested model SKU doesn’t exist or isn’t authorized for your API key.
401authentication_errorMissing or invalid API key.
403permission_errorAPI key doesn’t have access to the requested model.
500server_errorInternal generation failure. Safe to retry with backoff.