Reasoning

Reasoning

Some Mixlayer models support an extended thinking mode where the model produces an internal chain of thought before its visible answer. The reasoning is returned in a separate reasoning_content field on the assistant message — you can show it to users, log it for debugging, or just ignore it.

Enabling thinking

There are two equivalent ways to enable thinking on a request:

{ "thinking": true }

or, for OpenAI compatibility:

{ "reasoning_effort": "low" | "medium" | "high" }

Both toggle the same underlying behavior. reasoning_effort is accepted as an alias and currently maps to a boolean enable/disable — the specific effort level is reserved for future use.

To explicitly disable thinking on a model that defaults to it, send thinking: false.

Reading reasoning_content

A non-streaming response with thinking enabled includes both fields on the assistant message:

{
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "reasoning_content": "Let me work through this. The user is asking about...",
      "content": "The answer is 42."
    },
    "finish_reason": "stop"
  }]
}

content is the visible answer you’d typically show to the user. reasoning_content is the model’s chain of thought — useful for debugging, evaluation, or building “show your work” UI.

Mixlayer extracts reasoning from <think>...</think> tags in the model’s raw output and routes it to reasoning_content automatically. You will never see the tags in either field.

Examples

curl https://models.mixlayer.ai/v1/chat/completions \
  -H "Authorization: Bearer $MIXLAYER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.5-27b",
    "thinking": true,
    "messages": [
      {"role": "user", "content": "If a train leaves at 3pm going 60mph and another leaves at 4pm going 80mph, when do they meet?"}
    ]
  }'

Streaming reasoning

When stream: true, reasoning arrives in delta.reasoning_content chunks alongside delta.content chunks. They interleave in the order the model produces them — typically reasoning first, then visible content.

data: {"choices":[{"delta":{"role":"assistant"}}]}
data: {"choices":[{"delta":{"reasoning_content":"Let me think. "}}]}
data: {"choices":[{"delta":{"reasoning_content":"17 * 23 = 17 * 20 + 17 * 3 = 340 + 51."}}]}
data: {"choices":[{"delta":{"content":"17 * 23 = 391."},"finish_reason":"stop"}]}

To render reasoning and content in separate UI areas, route each delta based on which field is set:

for chunk in stream:
    delta = chunk.choices[0].delta
    extra = delta.model_extra or {}
    if extra.get("reasoning_content"):
        update_reasoning_pane(extra["reasoning_content"])
    if delta.content:
        update_answer_pane(delta.content)

Constraints

⚠️

Thinking mode is incompatible with response_format: json_schema. The gateway returns an error if both are set on the same request. If you need structured output from a reasoning model, use response_format: json_object with explicit instructions in the prompt instead.

Thinking is supported on the Qwen 3.5 family. See Models for the up-to-date list of supported models and their recommended sampling settings for thinking vs. non-thinking modes.